LLM Techniques

Nov 24, 2025

Model Quantization: Concepts, Methods, and Why It Matters

AI models are becoming increasingly complex, often exceeding the capabilities of available hardware. Quantization has emerged as a crucial technique to address...

12 MIN READ

Nov 07, 2025

Building an Interactive AI Agent for Lightning-Fast Machine Learning Tasks

Data scientists spend a lot of time cleaning and preparing large, unstructured datasets before analysis can begin, often requiring strong programming and...

8 MIN READ

Nov 06, 2025

Accelerating Large-Scale Mixture-of-Experts Training in PyTorch

Training massive mixture-of-experts (MoE) models has long been the domain of a few advanced users with deep infrastructure and distributed-systems expertise....

7 MIN READ

Oct 20, 2025

Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron

Modern organizations generate a massive volume of operational data through ticketing systems, incident reports, service requests, support escalations, and more....

11 MIN READ

Oct 15, 2025

Agentic AI Unleashed: Join the AWS & NVIDIA Hackathon

Build the next generation of intelligent, autonomous applications. This isn't just a hackathon—it's your chance to unleash the power of agentic AI and show...

1 MIN READ

Oct 09, 2025

From Assistant to Adversary: Exploiting Agentic AI Developer Tools

Developers are increasingly turning to AI-enabled tools for coding, including Cursor, OpenAI Codex, Claude Code, and GitHub Copilot. While these automation...

10 MIN READ

Oct 07, 2025

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer

Large language models (LLMs) have set a high bar in natural language processing (NLP) tasks such as coding, reasoning, and math. However, their deployment...

11 MIN READ

Oct 02, 2025

Practical LLM Security Advice from the NVIDIA AI Red Team

Over the last several years, the NVIDIA AI Red Team (AIRT) has evaluated numerous and diverse AI-enabled systems for potential vulnerabilities and security...

8 MIN READ

Sep 23, 2025

Faster Training Throughput in FP8 Precision with NVIDIA NeMo

In previous posts on FP8 training, we explored the fundamentals of FP8 precision and took a deep dive into the various scaling recipes for practical large-scale...

12 MIN READ

Sep 17, 2025

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits...

11 MIN READ

Sep 11, 2025

How Quantization Aware Training Enables Low-Precision Accuracy Recovery

After training AI models, a variety of compression techniques can be used to optimize them for deployment. The most common is post-training quantization (PTQ),...

10 MIN READ

Sep 02, 2025

Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap

Deploying large language models (LLMs) at scale presents a dual challenge: ensuring fast responsiveness during high demand, while managing the costs of GPUs....

6 MIN READ

Aug 13, 2025

Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2

Currently, one of the most compelling questions in AI is whether large language models (LLMs) can continue to improve through sustained reinforcement learning...

8 MIN READ

Aug 07, 2025

How Hackers Exploit AI's Problem-Solving Instincts

As multimodal AI models advance from perception to reasoning, and even start acting autonomously, new attack surfaces emerge. These threats don’t just target...

10 MIN READ

Jun 18, 2025

LLM Inference Benchmarking: How Much Does Your LLM Inference Cost?

This is the fourth post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of...

10 MIN READ

Jun 02, 2025

Scaling to Millions of Tokens with Efficient Long-Context LLM Training

The evolution of large language models (LLMs) has been marked by significant advancements in their ability to process and generate text. Among these...

7 MIN READ