LLM Techniques

Oct 20, 2025
Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron
Modern organizations generate a massive volume of operational data through ticketing systems, incident reports, service requests, support escalations, and more....
11 MIN READ

Oct 15, 2025
Agentic AI Unleashed: Join the AWS & NVIDIA Hackathon
Build the next generation of intelligent, autonomous applications. This isn't just a hackathon—it's your chance to unleash the power of agentic AI and show...
1 MIN READ

Oct 09, 2025
From Assistant to Adversary: Exploiting Agentic AI Developer Tools
Developers are increasingly turning to AI-enabled tools for coding, including Cursor, OpenAI Codex, Claude Code, and GitHub Copilot. While these automation...
10 MIN READ

Oct 07, 2025
Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer
Large language models (LLMs) have set a high bar in natural language processing (NLP) tasks such as coding, reasoning, and math. However, their deployment...
11 MIN READ

Oct 02, 2025
Practical LLM Security Advice from the NVIDIA AI Red Team
Over the last several years, the NVIDIA AI Red Team (AIRT) has evaluated numerous and diverse AI-enabled systems for potential vulnerabilities and security...
8 MIN READ

Sep 23, 2025
Faster Training Throughput in FP8 Precision with NVIDIA NeMo
In previous posts on FP8 training, we explored the fundamentals of FP8 precision and took a deep dive into the various scaling recipes for practical large-scale...
12 MIN READ

Sep 17, 2025
An Introduction to Speculative Decoding for Reducing Latency in AI Inference
Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits...
11 MIN READ

Sep 11, 2025
How Quantization Aware Training Enables Low-Precision Accuracy Recovery
After training AI models, a variety of compression techniques can be used to optimize them for deployment. The most common is post-training quantization (PTQ),...
10 MIN READ

Sep 02, 2025
Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap
Deploying large language models (LLMs) at scale presents a dual challenge: ensuring fast responsiveness during high demand, while managing the costs of GPUs....
6 MIN READ

Aug 13, 2025
Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2
Currently, one of the most compelling questions in AI is whether large language models (LLMs) can continue to improve through sustained reinforcement learning...
8 MIN READ

Aug 07, 2025
How Hackers Exploit AI's Problem-Solving Instincts
As multimodal AI models advance from perception to reasoning, and even start acting autonomously, new attack surfaces emerge. These threats don’t just target...
10 MIN READ

Jun 18, 2025
LLM Inference Benchmarking: How Much Does Your LLM Inference Cost?
This is the fourth post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of...
10 MIN READ

Jun 02, 2025
Scaling to Millions of Tokens with Efficient Long-Context LLM Training
The evolution of large language models (LLMs) has been marked by significant advancements in their ability to process and generate text. Among these...
7 MIN READ

May 27, 2025
Advanced Optimization Strategies for LLM Training on NVIDIA Grace Hopper
In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training...
10 MIN READ

May 23, 2025
Stream Smarter and Safer: Learn how NVIDIA NeMo Guardrails Enhance LLM Output Streaming
​​LLM Streaming sends a model's response incrementally in real time, token by token, as it's being generated. The output streaming capability has evolved...
8 MIN READ

Feb 25, 2025
Defining LLM Red Teaming
There is an activity where people provide inputs to generative AI technologies, such as large language models (LLMs), to see if the outputs can be made to...
10 MIN READ