Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit.  Register Free


NVIDIA NeMo™ is an open-source framework for developers to build and train state-of-the-art conversational AI models.

Download now

What Is NVIDIA NeMo?

NVIDIA NeMo, part of the NVIDIA AI platform, is a framework for building, training, and fine-tuning GPU-accelerated speech and natural language understanding (NLU) models with a simple Python interface. Using NeMo, developers can create new model architectures and train them using mixed- precision compute on Tensor Cores in NVIDIA GPUs through easy-to-use application programming interfaces (APIs).

NeMo Megatron is a part of the framework that provides parallelization technologies such as pipeline and tensor parallelism from the Megatron-LM research project for training large-scale language models.

With NeMo, you can build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants, and automated call center support across healthcare, finance, retail, and telecommunications.

Explore the benefits of NeMo.

nvidia gpu cloud

Build models rapidly.

Configure, build, and train models quickly with simple Python APIs.

nvidia gpu cloud

Customize models.

Download and customize pre-trained state-of-the-art models from the NVIDIA NGC™ catalog.

inter app workflows

Use wide integrations.

Interoperate NeMo with PyTorch and the PyTorch Lightning ecosystem.

Deploy with NVIDIA AI

Deploy with NVIDIA AI.

Leverage NVIDIA AI by applying NVIDIA® TensorRT™ optimizations and exporting to NVIDIA Riva for high performance inference.

Get a NeMo overview.

Nemo New Model Architectures
Figure 1: ASR pipeline using NeMo modules

Easily Compose New Model Architectures.

NeMo includes domain-specific collections for ASR, NLP and TTS to develop state-of-the-art models such as Citrinet, Jasper, BERT, Fastpitch, and HiFiGAN. A NeMo model is composed of neural modules, which are the building blocks of models. The inputs and outputs of these modules are strongly typed with neural types that can automatically perform the semantic checks between the modules.

NeMo is designed to offer high flexibility and you can use the Hydra framework to modify the behavior of models easily. For instance, you can modify the architecture of the Citrinet Encoder module in the following diagram using Hydra.

Train state-of-the-art conversational AI models.

Several NeMo pre-trained, state-of-the-art models in NGC are trained for over 100,000 hours on NVIDIA DGX™ across open-source, free datasets. You can fine-tune these models or modify them with NeMo before training for your use case.

NeMo uses mixed precision on Tensor Cores to speed up training upto 4.5X on a single GPU versus FP32 precision. You can further scale training to multi-GPU systems and multi-node clusters.

Figure 2: Highly accurate pre-trained models

Scale large language modeling with NeMo Megatron.

Training large-scale language models with NVIDIA NeMo
Figure 3: Training large-scale language models with NVIDIA NeMo Megatron

NeMo Megatron is an end-to-end containerized framework that delivers high training efficiency across thousands of GPUs and makes it practical for enterprises to build and deploy large-scale models. It provides capabilities to curate training data, train large-scale models up to trillions of parameters, customize using prompt learning techniques, and deploy using NVIDIA Triton™ Inference Server to run large-scale models on multiple GPUs and multiple nodes.

NeMo Megaton is optimized to run on NVIDIA DGX SuperPOD™, Amazon Web Services, Oracle Cloud Infrastructure, and Microsoft Azure.

Learn more about NeMo Megatron
Register for Open Beta

Run large language models Using NeMo LLM Service anywhere.

NeMo LLM Service running on the NVIDIA AI platform provides enterprise developers the fastest path to customize and use LLMs for their AI applications. It can be anywhere, on private infrastructure, in the cloud, or through an API.

NeMo LLM Service supports models trained on several frameworks and can run out of the box or be customized for specific use cases. NeMo LLM Service includes the Megatron 530B model for quick experimentation with the world’s most powerful language model.

Learn More About NeMo LLM service
Apply for NeMo LLM Service early access
the fastest-path to customize and use foundation LLMs and deploy on private and public clouds.
Figure 4: NeMo LLM service provides the fastest-path to customize and use foundation LLMs and deploy on private and public clouds.

See the flexible, open-source, rapidly expanding ecosystem.

Figure 5: NeMo Integration with PyTorch and PyTorch Lightning

NeMo is built on top of PyTorch and PyTorch Lightning, providing an easy path for researchers to develop and integrate with modules with which they are already comfortable. PyTorch and PyTorch lightning are open-source python libraries that provide modules to compose models.

To provide the researcher's flexibility to customize the models/modules easily, NeMo integrated with the Hydra framework. Hydra is a popular framework that simplifies the development of complex conversational AI models.

NeMo is available as an open-source so that researchers can contribute to and build on it.

Deploy to Production using NVIDIA AI.

To leverage the NVIDIA AI platform and deploy NeMo speech models in production with NVIDIA Riva, developers should export NeMo models to a format compatible with Riva and then execute Riva build and deploy commands for creating an optimized skill that can run in real-time.

The documentation includes detailed instructions for exporting and deploying NeMo models to Riva.

Figure 6: NeMo to Riva deployment

Use with popular frameworks.

NeMo is built on top of the popular PyTorch framework and facilitates researchers to use the NeMo modules with PyTorch applications.

Learn more

NeMo with Pytorch Lightning enables easy and performant multi-GPU/multi-node mixed-precision training.

Learn more

Hydra is a flexible solution that allows researchers to configure NeMo modules and models quickly from a config file and command line.

Learn more

Meet NeMo’s data generation and data annotation partners.

NVIDIA NeMo provides the capability to train and fine tune state-of-the-art models built using it. Fine-tuning models requires high quality labeled data, which might not be readily available. NeMo is integrated with several easy-to-use speech and language data labeling tools to help acquire labeled data as well as label custom data.

Learn more

Provides off-the-shelf training data in multiple languages and domains.

Learn more

Generates high-quality labels and delivers accurate results in production.

Learn more

Sets the standard in data labeling and extracts valuable insights from raw data.

Learn more

Leading Adopters


Find more resources.

Get Started with Tutorials

Check out tutorials to get up and running quickly with state-of-the-art speech and language models.

Learn more

Take a NeMo Tour

Understand the advantages of using NVIDIA NeMo with a Jupyter Notebook walkthrough.

Read blog

Explore More Conversational AI Blogs

Keep yourself up to date by learning what's new and upcoming in conversational AI.

Explore blogs

NeMo is available to download from NGC. You can also download with pip install command and Docker container from NVIDIA NeMo GitHub repository

Download now