Generative AI

Advancing Agentic AI with NVIDIA Nemotron Open Reasoning Models

As AI progresses toward greater autonomy, the emergence of AI agents capable of independent decision-making marks a significant milestone. To function effectively in complex, real-world environments, these agents must go beyond pattern recognition and statistical predictions. This is driven by the adoption of reasoning models, designed to process information, apply logic, and make decisions, enabling more intelligent and adaptable behavior. 

By combining structured thinking with contextual awareness, reasoning models provide the cognitive foundation for agents to navigate dynamic tasks with human-like understanding.

Enterprises need advanced reasoning models with full control running on any platform to maximize their agents’ capabilities. To accelerate enterprise adoption of AI agents, NVIDIA is building the NVIDIA Nemotron family of open models. These models achieve leading accuracy for reasoning and agentic tasks and deliver the highest compute efficiency among open reasoning models across accelerated computing, from edge to data center and cloud. 

This post explains the process of building Nemotron models, which begins with the best foundation models available. These models are then augmented for reasoning and agentic performance, and maximized for compute efficiency, throughput, and latency performance. 

Leading accuracy, higher throughput, and lower TCO 

To create a Nemotron model, the team starts with an open frontier model and performs a series of key steps, as shown in Table 1 and Figure 1.

TechniqueDescriptionPurpose/Benefit
Neural Architecture Search (NAS)Automatically explores model designs that balance accuracy, latency, and efficiency for LLMs like Llama, enabling agentic AI at scale.Optimizes model structure for best trade-off between performance and efficiency.
Knowledge DistillationSynthetic data generation (SDG) was used during multiple stages of the training, as well as curated high-quality data to transfer reasoning skills from large models to smaller, faster ones, augmenting performance while lowering compute costs. Creates efficient models with strong reasoning capabilities at reduced computational expense.
Supervised Fine-TuningTrains models with a mix of reasoning and nonreasoning data, helping them adapt responses based on the task type.Improves model adaptability and response quality across diverse tasks.
Reinforcement Learning (RL)Further refines reasoning quality and performance on nonreasoning tasks by rewarding accurate, structured outputs, boosting performance beyond supervised learning alone.Enhances output quality and task performance through reward-based optimization.
Table 1. Key steps in training NVIDIA Nemotron models
Diagram showing the model optimization pipeline: starting from an open model, applying NAS, knowledge distillation, synthetic data pretraining with reasoning prompts, supervised fine-tuning, and reinforcement learning to develop efficient, high-performing reasoning models.
Figure 1. NVIDIA Nemotron model training pipeline

As a result of these optimization techniques, Nemotron models are able to achieve leading accuracy while significantly reducing model size, thereby providing higher throughput. This reduces overall TCO, making them ideal for enterprise use. As shown in Figure 2, previously released Llama Nemotron models provide up to 5x higher throughput compared to other leading open models. 

Chart comparing average accuracy (x-axis) versus throughput (y-axis) for Llama 3.3 70B, Llama Nemotron Super, and DeepSeek-R1 Llama. Llama Nemotron Super shows 5x higher throughput.
Figure 2. Comparison of average accuracy versus throughput for Llama Nemotron Super model

Model builders across Europe adopt NVIDIA Nemotron

At GTC Paris, NVIDIA announced collaborations with several prominent sovereign AI model developers across Europe—including those in France, Germany, Italy, Luxembourg, Poland, Spain and Sweden—to create optimized versions of their models.

Nemotron models are also available as NVIDIA NIM inference microservices, optimized for high throughput and low latency. NVIDIA NIM delivers seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry-standard APIs.

Announcing Mistral-Nemotron, a state-of-the-art model for AI agents

New to the Nemotron family, the Mistral-Nemotron model is a significant advancement for enterprise agentic AI. Mistral-Nemotron is a turbo model, offering significant compute efficiency combined with high accuracy to meet the demanding needs of enterprise-ready AI agents. 

Mistral-Nemotron is designed for a wide range of professional applications and excels in coding and instruction following. It performs well across domains, including software development and customer service. Mistral-Nemotron also excels in tool calling, making it ideal for building agents in enterprise applications. 

The Mistral-Nemotron model is available as a NIM microservice, offering high throughput and low latency. You can download the NIM microservice and deploy it anywhere, from on-premises to the cloud.

More leading enterprise-ready Nemotron open models

Enterprise-ready models such as Llama Nemotron Ultra, and Llama Nemotron Nano lead other open models in their respective sizes for reasoning, math, and tool calling. The recently announced Llama Nemotron Vision now ranks highest on OCRBench V2 for visual reasoning and document understanding.

The NVIDIA Research team has also introduced AceReasoning Nemotron, which excels in math and coding, and Nemotron-H, a family of hybrid Mamba-Transformer models that deliver high accuracy with significantly faster inference speeds.

The Llama Nemotron Safety Guard V2 is a leading open content safety model that scored the highest in overall average accuracy—81.6%—during NVIDIA testing. It was trained using the Nemotron Content Safety Dataset V2, featuring more than 33K annotated human-LLM interactions. Built on the Llama 3.1 8B Instruct model, it classifies prompts and responses as safe or unsafe and flags violations using the NVIDIA detailed safety risk taxonomy.

Among agents, Nemotron-CORTEXA stands out as a state-of-the-art software engineering agent designed to resolve real issues on GitHub repositories. It identifies the correct source files and code snippets, generates multiple bug fixes and unit tests, and selects the best solution using an LLM-as-a-judge strategy. It solves 68.2% of issues in the SWE-bench Verified set, for resolution accuracy and efficiency.

The Nemotron team has also open sourced the datasets used for training the models and they have been trending at the top of Hugging Face leaderboards. 

The OpenMathReasoning dataset is designed to train LLMs in advanced mathematical problem-solving. Conversely, the OpenCodeReasoning dataset focuses on enhancing LLM capabilities in code generation and reasoning, comprising competitive programming challenges paired with high-quality solutions generated by models such as DeepSeek-R1. 

Nemotron-Personas is an open source dataset of synthetic personas aligned with real U.S. demographic and geographic distributions to reflect population diversity across attributes like age, education, occupation, and ethnicity. Designed using Gretel Data Designer to improve diversity and complexity in synthetic data, as well as to reduce model bias, it supports a variety of domains and use cases.

Get started with NVIDIA Nemotron models

Try the Mistral-Nemotron NIM directly from your browser. Stay tuned for a downloadable NIM coming soon. You can also access the previously released Llama Nemotron models and training datasets:

Discuss (0)

Tags