LLM Techniques

Oct 02, 2025
Practical LLM Security Advice from the NVIDIA AI Red Team
Over the last several years, the NVIDIA AI Red Team (AIRT) has evaluated numerous and diverse AI-enabled systems for potential vulnerabilities and security...
8 MIN READ

Sep 23, 2025
Faster Training Throughput in FP8 Precision with NVIDIA NeMo
In previous posts on FP8 training, we explored the fundamentals of FP8 precision and took a deep dive into the various scaling recipes for practical large-scale...
12 MIN READ

Sep 17, 2025
An Introduction to Speculative Decoding for Reducing Latency in AI Inference
Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits...
11 MIN READ

Sep 11, 2025
How Quantization Aware Training Enables Low-Precision Accuracy Recovery
After training AI models, a variety of compression techniques can be used to optimize them for deployment. The most common is post-training quantization (PTQ),...
10 MIN READ

Sep 02, 2025
Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap
Deploying large language models (LLMs) at scale presents a dual challenge: ensuring fast responsiveness during high demand, while managing the costs of GPUs....
6 MIN READ

Aug 13, 2025
Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2
Currently, one of the most compelling questions in AI is whether large language models (LLMs) can continue to improve through sustained reinforcement learning...
8 MIN READ

Aug 07, 2025
How Hackers Exploit AI's Problem-Solving Instincts
As multimodal AI models advance from perception to reasoning, and even start acting autonomously, new attack surfaces emerge. These threats don’t just target...
10 MIN READ

Jun 18, 2025
LLM Inference Benchmarking: How Much Does Your LLM Inference Cost?
This is the fourth post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of...
10 MIN READ

Jun 02, 2025
Scaling to Millions of Tokens with Efficient Long-Context LLM Training
The evolution of large language models (LLMs) has been marked by significant advancements in their ability to process and generate text. Among these...
7 MIN READ

May 27, 2025
Advanced Optimization Strategies for LLM Training on NVIDIA Grace Hopper
In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training...
10 MIN READ

May 23, 2025
Stream Smarter and Safer: Learn how NVIDIA NeMo Guardrails Enhance LLM Output Streaming
​​LLM Streaming sends a model's response incrementally in real time, token by token, as it's being generated. The output streaming capability has evolved...
8 MIN READ

Feb 25, 2025
Defining LLM Red Teaming
There is an activity where people provide inputs to generative AI technologies, such as large language models (LLMs), to see if the outputs can be made to...
10 MIN READ

Feb 25, 2025
Agentic Autonomy Levels and Security
Agentic workflows are the next evolution in AI-powered tools. They enable developers to chain multiple AI models together to perform complex activities, enable...
14 MIN READ

Feb 12, 2025
LLM Model Pruning and Knowledge Distillation with NVIDIA NeMo Framework
Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. ...
10 MIN READ

Jan 29, 2025
Mastering LLM Techniques: Evaluation
Evaluating large language models (LLMs) and retrieval-augmented generation (RAG) systems is a complex and nuanced process, reflecting the sophisticated and...
12 MIN READ

Jan 16, 2025
Continued Pretraining of State-of-the-Art LLMs for Sovereign AI and Regulated Industries with Domyn and NVIDIA DGX Cloud
In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and...
17 MIN READ