LLM Techniques

Mar 11, 2026

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

Agentic AI systems need models with the specialized depth to solve dense technical problems autonomously. They must excel at reasoning, coding, and long-context...

12 MIN READ

Mar 09, 2026

Removing the Guesswork from Disaggregated Serving

Deploying and optimizing large language models (LLMs) for high-performance, cost-effective serving can be an overwhelming engineering problem. The ideal...

10 MIN READ

Feb 04, 2026

How to Build a Document Processing Pipeline for RAG with Nemotron

What if your AI agent could instantly parse complex PDFs, extract nested tables, and "see" data within charts as easily as reading a text file? With NVIDIA...

9 MIN READ

Feb 03, 2026

Accelerating Long-Context Model Training in JAX and XLA

Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond....

9 MIN READ

Jan 15, 2026

How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning

What if your computer-use agent could learn a new Command Line Interface (CLI)—and operate it safely without ever writing files or free-typing shell commands?...

11 MIN READ

Dec 16, 2025

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM

For machine learning engineers deploying LLMs at scale, the equation is familiar and unforgiving: as context length increases, attention computation costs...

6 MIN READ

Nov 24, 2025

Model Quantization: Concepts, Methods, and Why It Matters

AI models are becoming increasingly complex, often exceeding the capabilities of available hardware. Quantization has emerged as a crucial technique to address...

12 MIN READ

Nov 07, 2025

Building an Interactive AI Agent for Lightning-Fast Machine Learning Tasks

Data scientists spend a lot of time cleaning and preparing large, unstructured datasets before analysis can begin, often requiring strong programming and...

8 MIN READ

Nov 06, 2025

Democratizing Large-Scale Mixture-of-Experts Training with NVIDIA PyTorch Parallelism

Training massive mixture-of-experts (MoE) models has long been the domain of a few advanced users with deep infrastructure and distributed-systems expertise....

7 MIN READ

Oct 20, 2025

Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron

Modern organizations generate a massive volume of operational data through ticketing systems, incident reports, service requests, support escalations, and more....

11 MIN READ

Oct 15, 2025

Agentic AI Unleashed: Join the AWS & NVIDIA Hackathon

Build the next generation of intelligent, autonomous applications. This isn't just a hackathon—it's your chance to unleash the power of agentic AI and show...

1 MIN READ

Oct 09, 2025

From Assistant to Adversary: Exploiting Agentic AI Developer Tools

Developers are increasingly turning to AI-enabled tools for coding, including Cursor, OpenAI Codex, Claude Code, and GitHub Copilot. While these automation...

10 MIN READ

Oct 07, 2025

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer

Large language models (LLMs) have set a high bar in natural language processing (NLP) tasks such as coding, reasoning, and math. However, their deployment...

11 MIN READ

Oct 02, 2025

Practical LLM Security Advice from the NVIDIA AI Red Team

Over the last several years, the NVIDIA AI Red Team (AIRT) has evaluated numerous and diverse AI-enabled systems for potential vulnerabilities and security...

8 MIN READ

Sep 23, 2025

Faster Training Throughput in FP8 Precision with NVIDIA NeMo

In previous posts on FP8 training, we explored the fundamentals of FP8 precision and took a deep dive into the various scaling recipes for practical large-scale...

12 MIN READ

Sep 17, 2025

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits...

11 MIN READ