NVIDIA Nemotron
NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.
NVIDIA Nemotron Models
Nemotron models are transparent—the training data used for these models, as well as their weights, are open and available on Hugging Face for you to evaluate before deploying them in production. The technical reports outlining the steps necessary to recreate these models are also freely available.
The new Nemotron 3 family provides the most efficient multimodal models, powered by hybrid Mamba‑Transformer MoE with 1M-token context, delivering top accuracy for complex, high-throughput agentic AI applications.
Easily deploy models using open frameworks like vLLM, SGLang, Ollama and llama.cpp on any NVIDIA GPUs—from the edge and cloud to the data center. Endpoints are also available as NVIDIA NIM™ microservices for easy deployment on any GPU-accelerated system.
Nemotron reasoning models are optimized for various platforms:
Nano provides cost efficiency with high accuracy specialized sub-agents—now with multimodal capabilities with Nano Omni..
Super delivers highest efficiency with leading accuracy for reasoning and tool calling for multi-agent applications.
-
Ultra is designed for applications demanding the highest reasoning accuracy for complex agentic tasks.
Additionally, these models provide the highest throughput, enabling agents to think faster and generate higher-accuracy responses while lowering inference cost.
Nemotron models are also available for visual understanding, information retrieval, speech, and safety.
Nemotron 3 Nano 30B A3B
- Nemotron 3 Nano offers 4x faster throughput compared to Nemotron 2 Nano
- Leading accuracy for coding, reasoning, math and long context tasks
- Perfect for agents that need to deliver highest accuracy and efficiency for targeted tasks
Nemotron 3 Nano Omni 30B A3B
- Single model for video, audio, image, and text understanding for a simplified agent workflow
- Multimodal reasoning for sub-agents within agentic use cases such as computer use agent, document intelligence, and video/audio understanding
- Highest in-class efficiency and with low costs
Nemotron 3 Super 120B A12B
- Highest in-class efficiency and leading accuracy
- Great for addressing complex tasks in multi-agent environment
- Suitable for single data center GPU deployments
Llama Nemotron Ultra 253B
- Ideal for multi-agent enterprise workflows requiring highest accuracy, such as customer service automation, supply chain management, and IT security
- Suitable for data center-scale deployments
Nemotron Parse
- Understands document semantics and extract text and tables elements with spatial grounding
- Overcomes traditional OCR limitations with support for multi-column layouts, LaTeX table extraction, markdown formatting, and reading-order reconstruction
- Designed to accelerate document intelligence pipelines for RAG, LLM training data curation, and agentic document workflows
Nemotron RAG
- Industry-leading extraction, embed, and rerank models
- Best-in-class accuracy for multimodal document intelligence, question answering, and passage retrieval
Nemotron Speech
- A family of open models optimized for high-throughput, ultra-low latency automatic speech recognition (ASR), text-to-speech (TTS), speech-to-speech (S2S), full-duplex, and neural machine translation (NMT) for agentic AI applications
- Nemotron Speech models with the NVIDIA Riva GPU-accelerated speech AI library deliver state-of-the-art ASR and TTS capabilities for seamless production deployment
Nemotron Safety
- Advanced multilingual, multimodal safety models that deliver high accuracy jailbreak detection, content moderation with cultural nuance, fine-grained PII detection, reasoning-based custom policy enforcement, and topic control for more secure and more compliant LLMs across global domains and use cases.
- NeMo Guardrails, a flexible, open library for defining and enforcing enterprise AI policies in real time—covering dialogue control, topic guidance, RAG grounding, tool‑call governance, safety filtering, and more—with parallel, low‑latency execution across custom, community, and NVIDIA safety rails.
NVIDIA Nemotron Datasets
Improve reasoning capabilities of large language models (LLMs) with one of the broadest commercially usable open data collections for agentic AI — spanning pre-training, post-training, personas, safety, RL, and RAG. Includes 10T+ tokens and 40M+ post-training samples, covering the full training lifecycle from foundation models to agent workflows.
Built with large-scale synthetic data generation, filtering, and curation — and released under permissive licenses. Developers can train, fine-tune, and evaluate models with full visibility into the data, accelerating development and reducing reliance on opaque datasets.
Nemotron Pre- and Post-Training Datasets
NVIDIA provides over 10T tokens of multilingual reasoning, coding, and safety data to help the community build their custom models.
Nemotron Personas Datasets
Fully synthetic, privacy-safe personas are grounded in real-world demographic, geographic, and cultural distributions. Part of NVIDIA’s growing global collection for Sovereign AI development, featuring datasets for USA, Japan, India, Singapore, Brazil, France, and South Korea.
Nemotron Omni Datasets
Multimodal data extending the Nemotron training pipeline beyond text to image, video, and speech. ~127B tokens of cross-modal pretraining data and ~124M curated post-training examples for document reasoning, computer use, and long-horizon workflows.
Nemotron Safety Datasets
High-quality, curated datasets built to power multilingual content safety, advanced policy reasoning, and threat-aware AI—spanning moderation data and audio-based safety signals for modern AI assistants.
Nemotron RL Datasets
Train models with the same reinforcement learning (RL) data powering Nemotron, including multi-turn trajectories, tool calls, and preference signals across coding, math, reasoning, and agentic tasks to build adaptive, reliable real-world AI.
Nemotron RAG Datasets
Unlock the foundation behind our leaderboard-topping model with the release of 15 meticulously curated datasets—spanning instruction-following, reasoning, coding, and evaluation data—to accelerate open research and transparent model development.
Developer Tools
NVIDIA NeMo
Simplify AI agent lifecycle management by fine-tuning, deploying, and continuously optimizing Nemotron models with NVIDIA NeMo™.
NVIDIA TensorRT-LLM
TensorRT™-LLM is an open-source library built to deliver high-performance, real-time inference optimization for large language models like Nemotron on NVIDIA GPUs. This open-source library is available on the TensorRT-LLM GitHub repo and includes a modular Python runtime, PyTorch-native model authoring, and a stable production API.
Open-Source Frameworks
Deploy Nemotron models using open-source frameworks such as Hugging Face transformers for development or vLLM for deployment and production use cases on all supported platforms.
Introductory Resources
Power Specialized AI Agents For Targeted Tasks With Efficient NVIDIA Nemotron 3 Nano Accuracy
NVIDIA Nemotron 3 Nano brings advanced reasoning and agentic capabilities with high efficiency using hybrid Transformer-Mamba MoE architecture and a configurable thinking budget—so you can dial accuracy, throughput, and cost to match your real‑world needs.
How to Build a Voice-Powered RAG Agent Using New Nemotron Models
Get a step-by-step guide on how to build a voice-powered RAG agent by integrating Nemotron models for speech, RAG, safety, and long-context reasoning.
Nemotron 3 Super: Open Hybrid Mamba-Transformer MoE for Agentic Reasoning
Nemotron 3 Super, a hybrid Mamba‑Transformer MoE model for large‑scale agentic AI, combines latent MoE, multi‑token prediction, and a 1M‑token context window for faster, more reliable long‑horizon reasoning. Native NVFP4 training, multi‑environment RL alignment, and fully open weights, datasets, recipes, and deployment cookbooks help developers quickly build and deploy customized agentic workflows.
Starter Kits
Start solving AI challenges by developing custom agents with NVIDIA Nemotron models for various use cases. Explore implementation scripts, explainer blogs, and more how-to documentation for various stages of AI development.
Build a Report Generation Agent With Nemotron
The workshop guides developers in building a report generation agent using NVIDIA Nemotron and LangGraph, focusing on four core considerations of AI agents: model, tools, memory and state, and routing.
Tutorial Video: Building a Report Generation Agent With NVIDIA Nemotron Nano v2
NVIDIA Launchable: Build an Agent Workshop
Learning Path: How to Build an AI Agent
Build a RAG Agent With Nemotron
In this self-paced workshop, gain a deep understanding of agentic retrieval-augmented generation (RAG) core principles, including the NVIDIA Nemotron model family, and learn how to build your own customized, shareable agentic RAG system using LangGraph within a turnkey, portable development environment.
Tutorial Video: Build a RAG Agent With NVIDIA Nemotron
On-Demand Livestream: Build a RAG Agent With NVIDIA Nemotron | Nemotron Labs
NVIDIA Launchable: Build an Agent Workshop
Learning Path: How To Build an Agent RAG Application
Build a Bash Computer Use Agent With Nemotron
In this self-paced workshop, gain a deep understanding of agentic retrieval-augmented generation (RAG) core principles, including the NVIDIA Nemotron model family, and learn how to build your own customized, shareable agentic RAG system using LangGraph within a turnkey, portable development environment.
Tutorial Video: Create a Bash Agent in One Hour
On-Demand Livestream: Build a Bash Computer Operator Agent | Nemotron Labs
Nemotron 3 Nano 30B A3B
Below are the resources that outline exactly how NVIDIA Research Teams trained the NVIDIA Nemotron 3 Nano model. From pretraining to final model checkpoint—everything is open and available for you to use and learn from.
Models: Nemotron 3 Model Collection
Datasets: Pretraining, Post training, and RL Dataset
Whitepaper: Nemotron 3 Whitepaper
Nemotron 3 Super 120B A3B
Below is a set of resources that outline the process NVIDIA used to produce the Nemotron 3 Super model.
Datasets: Pretraining, Post training, and RL Dataset
Models: Nemotron 3 Model Collection
Build a Voice Agent With RAG and Safety Guardrails With Nemotron
In this tutorial, you’ll learn how to build a voice-powered RAG agent with safety guardrails using Nemotron models. By the end, your agent will listen to spoken input, ground itself in your data, reason over long context, apply guardrails, and return safe answers as audio.
Tutorial Video: How to Build a Voice Agent With RAG and Safety Guardrails
Models: Speech
Models: RAG
Models: Safety
Models: Reasoning
Run Nemotron Models Across Hosted and Self-Managed Infrastructure
Run, scale, and evaluate Nemotron models on your own infrastructure or on managed infrastructure using hosted endpoints.
Focus on building agentic AI applications while providers handle optimized runtimes, elastic scaling, and production-ready deployment paths—so you can move faster from prototyping to production. For self-managed deployments, platforms like Canonical (Ubuntu, Kubernetes, and MLOps tooling) enable running Nemotron models (in private cloud, on-prem, or hybrid environments with full control over infrastructure.
Available providers:
You can also explore Nemotron model details, documentation, and access paths through the following discovery and access channels:
- LM Studio —Built-in interface and OpenAI-compatible API
- Ollama—CLI and developer-friendly local API
- llama.cpp—Lightweight, high-performance inference engine (GGUF models available via Hugging Face)
- Unsloth—Efficient local fine-tuning inference with optimized memory usage and performance
If you prefer to optimize the inference stack for your use cases, get started with cookbooks for vLLM, SGLang or TensorRT-LLM
You can also explore Nemotron model details, documentation, and access paths through the following discovery and access channels:
More Resources
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloading or using this model in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
NVIDIA has collaborated with Google DeepMind to watermark generated videos from the NVIDIA API catalog.
For more detailed information on ethical considerations for this model, please see the System Card, Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.