As AI progresses toward greater autonomy, the emergence of AI agents capable of independent decision-making marks a significant milestone. To function effectively in complex, real-world environments, these agents must go beyond pattern recognition and statistical predictions. This is driven by the adoption of reasoning models, designed to process information, apply logic, and make decisions, enabling more intelligent and adaptable behavior.
By combining structured thinking with contextual awareness, reasoning models provide the cognitive foundation for agents to navigate dynamic tasks with human-like understanding.
Enterprises need advanced reasoning models with full control running on any platform to maximize their agents’ capabilities. To accelerate enterprise adoption of AI agents, NVIDIA is building the NVIDIA Nemotron family of open models. These models achieve leading accuracy for reasoning and agentic tasks and deliver the highest compute efficiency among open reasoning models across accelerated computing, from edge to data center and cloud.
This post explains the process of building Nemotron models, which begins with the best foundation models available. These models are then augmented for reasoning and agentic performance, and maximized for compute efficiency, throughput, and latency performance.
Leading accuracy, higher throughput, and lower TCO
To create a Nemotron model, the team starts with an open frontier model and performs a series of key steps, as shown in Table 1 and Figure 1.
Technique | Description | Purpose/Benefit |
Neural Architecture Search (NAS) | Automatically explores model designs that balance accuracy, latency, and efficiency for LLMs like Llama, enabling agentic AI at scale. | Optimizes model structure for best trade-off between performance and efficiency. |
Knowledge Distillation | Synthetic data generation (SDG) was used during multiple stages of the training, as well as curated high-quality data to transfer reasoning skills from large models to smaller, faster ones, augmenting performance while lowering compute costs. | Creates efficient models with strong reasoning capabilities at reduced computational expense. |
Supervised Fine-Tuning | Trains models with a mix of reasoning and nonreasoning data, helping them adapt responses based on the task type. | Improves model adaptability and response quality across diverse tasks. |
Reinforcement Learning (RL) | Further refines reasoning quality and performance on nonreasoning tasks by rewarding accurate, structured outputs, boosting performance beyond supervised learning alone. | Enhances output quality and task performance through reward-based optimization. |

As a result of these optimization techniques, Nemotron models are able to achieve leading accuracy while significantly reducing model size, thereby providing higher throughput. This reduces overall TCO, making them ideal for enterprise use. As shown in Figure 2, previously released Llama Nemotron models provide up to 5x higher throughput compared to other leading open models.

Model builders across Europe adopt NVIDIA Nemotron
At GTC Paris, NVIDIA announced collaborations with several prominent sovereign AI model developers across Europe—including those in France, Germany, Italy, Luxembourg, Poland, Spain and Sweden—to create optimized versions of their models.
Nemotron models are also available as NVIDIA NIM inference microservices, optimized for high throughput and low latency. NVIDIA NIM delivers seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry-standard APIs.
Announcing Mistral-Nemotron, a state-of-the-art model for AI agents
New to the Nemotron family, the Mistral-Nemotron model is a significant advancement for enterprise agentic AI. Mistral-Nemotron is a turbo model, offering significant compute efficiency combined with high accuracy to meet the demanding needs of enterprise-ready AI agents.
Mistral-Nemotron is designed for a wide range of professional applications and excels in coding and instruction following. It performs well across domains, including software development and customer service. Mistral-Nemotron also excels in tool calling, making it ideal for building agents in enterprise applications.
The Mistral-Nemotron model is available as a NIM microservice, offering high throughput and low latency. You can download the NIM microservice and deploy it anywhere, from on-premises to the cloud.
More leading enterprise-ready Nemotron open models
Enterprise-ready models such as Llama Nemotron Ultra, and Llama Nemotron Nano lead other open models in their respective sizes for reasoning, math, and tool calling. The recently announced Llama Nemotron Vision now ranks highest on OCRBench V2 for visual reasoning and document understanding.
The NVIDIA Research team has also introduced AceReasoning Nemotron, which excels in math and coding, and Nemotron-H, a family of hybrid Mamba-Transformer models that deliver high accuracy with significantly faster inference speeds.
The Llama Nemotron Safety Guard V2 is a leading open content safety model that scored the highest in overall average accuracy—81.6%—during NVIDIA testing. It was trained using the Nemotron Content Safety Dataset V2, featuring more than 33K annotated human-LLM interactions. Built on the Llama 3.1 8B Instruct model, it classifies prompts and responses as safe or unsafe and flags violations using the NVIDIA detailed safety risk taxonomy.
Among agents, Nemotron-CORTEXA stands out as a state-of-the-art software engineering agent designed to resolve real issues on GitHub repositories. It identifies the correct source files and code snippets, generates multiple bug fixes and unit tests, and selects the best solution using an LLM-as-a-judge strategy. It solves 68.2% of issues in the SWE-bench Verified set, for resolution accuracy and efficiency.
The Nemotron team has also open sourced the datasets used for training the models and they have been trending at the top of Hugging Face leaderboards.
The OpenMathReasoning dataset is designed to train LLMs in advanced mathematical problem-solving. Conversely, the OpenCodeReasoning dataset focuses on enhancing LLM capabilities in code generation and reasoning, comprising competitive programming challenges paired with high-quality solutions generated by models such as DeepSeek-R1.
Nemotron-Personas is an open source dataset of synthetic personas aligned with real U.S. demographic and geographic distributions to reflect population diversity across attributes like age, education, occupation, and ethnicity. Designed using Gretel Data Designer to improve diversity and complexity in synthetic data, as well as to reduce model bias, it supports a variety of domains and use cases.
Get started with NVIDIA Nemotron models
Try the Mistral-Nemotron NIM directly from your browser. Stay tuned for a downloadable NIM coming soon. You can also access the previously released Llama Nemotron models and training datasets:
- Try the Llama Nemotron Nano, Super, and Ultra models directly from your browser.
- Download the Llama Nemotron family collection and the datasets such as OpenMathReasoning, OpenCodeReasoning, and Llama Nemotron Post Training datasets from Hugging Face.