NVIDIA Nemotron

NVIDIA Nemotron™ is a family of most open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.

Explore ModelsForum


NVIDIA Nemotron Models

Nemotron models are incredibly transparent—the training data used for these models, as well as their weights, are open and available on Hugging Face for developers to evaluate before deploying them in production. The technical reports outlining the steps necessary to recreate these models are also freely available.

Nemotron models show strong performance across agentic benchmarks including scientific reasoning, advanced math, coding, function calling, instruction following, optical character recognition, and more, and the models can be further tuned with open tools for improving application-specific accuracy.
The model endpoints can be easily deployed using open frameworks like vLLM, SGLang, and llama.cpp and are also available as NVIDIA NIM™ microservices for easy deployment on any platform. 

The models are optimized for various platforms: 

  • Nano offers cost-efficiency at the edge.

  • Super balances accuracy and compute on a single GPU.

  • Ultra is designed for data center-scale deployments.


Additionally, these models provide up to 6x higher throughput, enabling agents to think faster and generate higher-accuracy response while lowering inference cost.

Nemotron Nano 2

Up to 6x faster throughput over leading 8B open models

Up to 60% lower token generation with new thinking budget feature

Perfect for applications that require real-time responses

Suitable for edge and single consumer-grade GPU deployments

Llama Nemotron Super 1.5

High in-class accuracy and throughput
Great for efficient deep research agents
Suitable for single data center GPU deployments

Llama Nemotron Ultra

Ideal for multi-agent enterprise workflows requiring highest accuracy, such as customer service automation, supply chain management, and IT security

Suitable for data center-scale deployments

Llama Nemotron Nano VL

Best-in-class vision language accuracy
Designed for document intelligence and information extraction

Suitable for single data center GPU deployments


NVIDIA Nemotron Datasets

Nemotron datasets are one of the largest open collections of synthetic data designed specifically to improve reasoning in large language models. With over 9T tokens of pre- and post-training data, the collection spans across math, coding, scientific knowledge, function calling, instruction following, and multi-step reasoning tasks. 

Generating, filtering, and curating this size of data is a huge undertaking, and by making the dataset openly available, researchers and developers can train, fine-tune, and evaluate models with greater transparency and build models faster.

Nemotron Pretraining Dataset

Build advanced models faster with a collection of high-quality pretraining datasets including math, coding, and multilingual Q&A.

Nemotron Post-Training Dataset 2

Explore the dataset for full transparency or customize your reasoning models faster with this multilingual reasoning dataset-enhancing math, coding, general reasoning, and instruction-following skills.

Llama Nemotron VLM Dataset

Get full transparency into the Llama Nemotron Nano VL model with the compilation of high-quality post-training datasets for understanding, querying, and summarizing images.


Developer Tools

NVIDIA NeMo

Simplify AI agent lifecycle management by fine-tuning, deploying, and continuously optimizing Nemotron models with NVIDIA NeMo™.

NVIDIA TensorRT-LLM

TensorRT™-LLM is an open-source library built to deliver high-performance, real-time inference optimization for large language models like Nemotron on NVIDIA GPUs. This open-source library is available on the TensorRT-LLM GitHub repo and includes a modular Python runtime, PyTorch-native model authoring, and a stable production API.

Open-Source Frameworks

Experience Nemotron models in open-source frameworks such as Hugging Face transformers for development or vLLM for deployment and production use cases on all supported platforms.


Introductory Resources

Supercharge Edge AI With High‑Accuracy Reasoning Using NVIDIA Nemotron Nano 2 9B

NVIDIA Nemotron Nano 2 9B brings reasoning capabilities to the edge with leading accuracy and efficiency with a hybrid Transformer-Mamba architecture and a configurable thinking budget—so you can dial accuracy, throughput, and cost to match your real‑world needs.

Build More Accurate and Efficient AI Agents With NVIDIA Llama Nemotron Super 1.5

AI agents now solve multi-step problems, write production-level code, and act as general assistants across multiple domains. But to reach their full potential, the systems need advanced reasoning models without being prohibitively expensive.

Open Dataset Preserves High-Value Math and Code, and Augments With Multilingual Reasoning

Build advanced reasoning models from carefully curated, high-signal web content and large-scale synthetic data.


Starter Kits

Start solving AI challenges by developing custom agents with NVIDIA Nemotron models for downstream use cases. Explore implementation scripts, explainer blogs, and more how-to documentation for various stages of AI development.

Nemotron Nano 2 9B

Below are the resources that outline exactly how NVIDIA Research Teams trained the NVIDIA Nemotron Nano 9B V2 model. From pretraining to final model checkpoint—everything is open and available for you to use and learn from.

Llama Nemotron Super 1.5 49B

Below are a set of resources that outline the process the NVIDIA Research Teams used to produce Llama 3.3 Nemotron Super 49B V1.5.


More Resources

 Decorative image representing Community

NVIDIA Developer Forums

Read NVIDIA Nemotron FAQ

Read NVIDIA Nemotron FAQ

Decorative image representing Developer Newsletter

Join Us on Discord


Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloading or using this model in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

NVIDIA has collaborated with Google DeepMind to watermark generated videos from the NVIDIA API catalog.

For more detailed information on ethical considerations for this model, please see the System Card, Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.

Get Started With NVIDIA Nemotron Today

Try Now