NVIDIA Nemotron

NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.

Explore Models Forum Tutorials

NVIDIA Nemotron Models

Nemotron models are transparent—the training data used for these models, as well as their weights, are open and available on Hugging Face for you to evaluate before deploying them in production. The technical reports outlining the steps necessary to recreate these models are also freely available.

The new Nemotron 3 family provides the most efficient open models, powered by hybrid Mamba‑Transformer MoE with 1M-token context, delivering top accuracy for complex, high-throughput agentic AI applications.

Easily deploy models using open frameworks like vLLM, SGLang, Ollama and llama.cpp on any NVIDIA GPUs — from the edge and cloud to the data center. Endpoints are also available as NVIDIA NIM™ microservices for easy deployment on any GPU-accelerated system.

Nemotron reasoning models are optimized for various platforms:

Nano provides cost efficiency with high accuracy for targeted agentic tasks
Super delivers high accuracy for multi-agentic reasoning
Ultra is designed for applications demanding the highest reasoning accuracy

Additionally, these models provide the highest throughput, enabling agents to think faster and generate higher-accuracy responses while lowering inference cost.

Nemotron models are also available for visual understanding, information retrieval, speech, and safety.

Nemotron 3 Nano 30B A3B

Nemotron 3 Nano offers 4x faster throughput compared to Nemotron 2 Nano
Leading accuracy for coding, reasoning, math and long context tasks
Perfect for agents that need to deliver highest accuracy and efficiency for targeted tasks

Demo Model on OpenRouter

Download the Model on Hugging Face

Read Technical Report

Llama Nemotron Super 49B

High in-class accuracy and throughput
Great for efficient deep research agents
Suitable for single data center GPU deployments

Demo the Model on DeepInfra

Experience Model as NVIDIA NIM API

Download the Model on Hugging Face

Llama Nemotron Ultra 253B

Ideal for multi-agent enterprise workflows requiring highest accuracy, such as customer service automation, supply chain management, and IT security
Suitable for data center-scale deployments

Demo the Model on OpenRouter

Download the Model on Hugging Face

Nemotron Nano VL 12B

Best-in-class vision language accuracy
Designed for document intelligence and video understanding
Suitable for single data center GPU deployments

Experience the Model as an NVIDIA NIM API

Download the Model on Hugging Face

Nemotron RAG

Industry-leading extraction, embed, and rerank models
Best-in-class accuracy for multimodal document intelligence, question answering, and passage retrieval
Leading positions on ViDoRe V1, ViDoRe V2, and MTEB and MMTEB leaderboard

Experience Models as NVIDIA NIM APIs

Download the Models Hugging Face

Nemotron Safety

Safety models for advanced jailbreak detection, multilingual content safety with cultural nuance and reasoning capabilities, privacy detection, and topic control for secure and compliant LLM operations globally

Experience the Model as an NVIDIA NIM API

Download the Model Hugging Face

Nemotron Speech

A family of open models optimized for high-throughput, ultra-low latency automatic speech recognition (ASR), text-to-speech (TTS), and neural machine translation (NMT) for agentic AI applications.

Experience the Model as an NVIDIA NIM API

Download the Model Hugging Face

NVIDIA Nemotron Datasets

Improve reasoning capabilities of large language models (LLMs) with one of the broadest commercially usable open collections of synthetic data for agentic AI. The Nemotron dataset collection spans pre- and post-training, personas, safety, RL, and RAG datasets, including over 10T language tokens and 18 million supervised fine-tuning (SFT) data samples.

Generating, filtering, and curating this size of data is a huge undertaking making these datasets openly available under permissive licenses. Researchers and developers can now train, fine-tune, and evaluate models with greater transparency and build models faster.

Nemotron Pre- and Post-Training Datasets

NVIDIA provides over 10T tokens of multilingual reasoning, coding, and safety data to help the community build their custom models.

Explore on Hugging Face

Nemotron Personas Datasets

Fully synthetic, privacy-safe personas are grounded in real-world demographic, geographic, and cultural distributions. Part of NVIDIA’s growing global collection for Sovereign AI development, featuring datasets for USA, Japan, and India.

Explore on Hugging Face

Llama Nemotron VLM Dataset

High-quality post-training datasets, including visual question answering (VQA) and optical character recognition (OCR) annotations that power document understanding, querying, and summarization for our best-in-class Llama Nemotron Nano vision-language models.

Explore on Hugging Face

Nemotron Safety Datasets

High-quality, curated datasets built to power multilingual content safety, advanced policy reasoning, and threat-aware AI—spanning moderation data and audio-based safety signals for modern AI assistants.

Explore on Hugging Face

Nemotron RL Datasets

Train models with the same reinforcement learning (RL) data powering Nemotron, including multi-turn trajectories, tool calls, and preference signals across coding, math, reasoning, and agentic tasks to build adaptive, reliable real-world AI.

Explore on Hugging Face

Nemotron RAG Datasets

Unlock the foundation behind our leaderboard-topping model with the release of 15 meticulously curated datasets—spanning instruction-following, reasoning, coding, and evaluation data—to accelerate open research and transparent model development.

Explore on Hugging Face

Feature Request Board

Shape the future of Nemotron. Upvote your favorite features or suggest new ones.

Vote Now

Developer Tools

NVIDIA NeMo

Simplify AI agent lifecycle management by fine-tuning, deploying, and continuously optimizing Nemotron models with NVIDIA NeMo™.

Get Started

NVIDIA TensorRT-LLM

TensorRT™-LLM is an open-source library built to deliver high-performance, real-time inference optimization for large language models like Nemotron on NVIDIA GPUs. This open-source library is available on the TensorRT-LLM GitHub repo and includes a modular Python runtime, PyTorch-native model authoring, and a stable production API.

Leverage TensorRT-LLM for NVIDIA Nemotron Model Inference

Open-Source Frameworks

Deploy Nemotron models using open-source frameworks such as Hugging Face transformers for development or vLLM for deployment and production use cases on all supported platforms.

Explore Inference Backends

Introductory Resources

Power Specialized AI Agents For Targeted Tasks With Efficient NVIDIA Nemotron 3 Nano Accuracy

NVIDIA Nemotron 3 Nano brings advanced reasoning and agentic capabilities with high efficiency using hybrid Transformer-Mamba MoE architecture and a configurable thinking budget—so you can dial accuracy, throughput, and cost to match your real‑world needs.

Read Tech Blog

How to Build a Voice-Powered RAG Agent Using New Nemotron Models

Get a step-by-step guide on how to build a voice-powered RAG agent by integrating Nemotron models for speech, RAG, safety, and long-context reasoning.

Read Tech Blog

Open Dataset Preserves High-Value Math and Code, and Augments With Multilingual Reasoning

Build advanced reasoning models from carefully curated, high-signal web content and large-scale synthetic data.

Read Tech Blog

Starter Kits

Start solving AI challenges by developing custom agents with NVIDIA Nemotron models for various use cases. Explore implementation scripts, explainer blogs, and more how-to documentation for various stages of AI development.

Build a Report Generation Agent With Nemotron

The workshop guides developers in building a report generation agent using NVIDIA Nemotron and LangGraph, focusing on four core considerations of AI agents: model, tools, memory and state, and routing.

Blog: Build a Report Generator AI Agent With NVIDIA Nemotron on OpenRouter
Tutorial Video: Building a Report Generation Agent With NVIDIA Nemotron Nano v2
On-Demand Livestream: Build an AI Agent for Report Generation With NVIDIA Nemotron on OpenRouter | Nemotron Labs
NVIDIA Launchable: Build an Agent Workshop
Learning Path: How to Build an AI Agent

Build a RAG Agent With Nemotron

In this self-paced workshop, gain a deep understanding of agentic retrieval-augmented generation (RAG) core principles, including the NVIDIA Nemotron model family, and learn how to build your own customized, shareable agentic RAG system using LangGraph within a turnkey, portable development environment.

Blog: Build a Retrieval-Augmented Generation (RAG) Agent With NVIDIA Nemotron
Tutorial Video: Build a RAG Agent With NVIDIA Nemotron
On-Demand Livestream: Build a RAG Agent With NVIDIA Nemotron | Nemotron Labs
NVIDIA Launchable: Build an Agent Workshop
Learning Path: How To Build an Agent RAG Application

Build a Bash Computer Use Agent With Nemotron

Blog: Create Your Own Bash Computer Use Agent With NVIDIA Nemotron in One Hour
Tutorial Video: Create a Bash Agent in One Hour
On-Demand Livestream: Build a Bash Computer Operator Agent | Nemotron Labs

Nemotron 3 Nano 30B A3B

Below are the resources that outline exactly how NVIDIA Research Teams trained the NVIDIA Nemotron 3 Nano model. From pretraining to final model checkpoint—everything is open and available for you to use and learn from.

Paper: NVIDIA Nemotron Nano 3: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Models: Nemotron 3 Model Collection
Datasets: Pretraining, Post training, and RL Dataset
Whitepaper: Nemotron 3 Whitepaper
Cookbooks: Deploy Nemotron 3 Nano on NVIDIA GPUs with vLLM, SGLang, TRT-LLM
Notebook/Docs: Fine-Tune For Your Use-Case with Unsloth Notebook and Docs

Llama Nemotron Super 1.5 49B

Below are a set of resources that outline the process the NVIDIA Research Teams used to produce Llama 3.3 Nemotron Super 49B V1.5.

Run Nemotron Models on Inference Service Providers

Run, scale, and evaluate Nemotron models without managing infrastructure using hosted endpoints from leading inference service providers. Quickly experiment, benchmark, and deploy models across cloud and data center environments with optimized performance and cost efficiency.

Focus on building agentic AI applications while providers handle optimized runtimes, elastic scaling, and production-ready deployment paths—so you can move faster from prototyping to production.

Available providers:

You can also explore Nemotron model details, documentation, and access paths through the following discovery and access channels:

More Resources

NVIDIA Developer Forums

Try NVIDIA Nemotron Tutorials

Join Us on Discord

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloading or using this model in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

NVIDIA has collaborated with Google DeepMind to watermark generated videos from the NVIDIA API catalog.

For more detailed information on ethical considerations for this model, please see the System Card, Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.

Get Started With NVIDIA Nemotron Today

Try Now