NVIDIA NeMo is an open-source toolkit for developing state-of-the-art conversational AI models.

Download Now

Building state-of-the-art conversational AI models requires researchers to quickly experiment with novel network architectures. This means going through the complex and time-consuming process of modifying multiple networks and verifying compatibility across inputs, outputs, and data pre-processing layers.

NVIDIA NeMo is a Python toolkit for building, training, and fine-tuning GPU-accelerated conversational AI models using a simple interface. Using NeMo, researchers and developers can build state-of-the-art conversational AI models using easy-to-use application programming interfaces (APIs). NeMo runs mixed precision compute using Tensor Cores in NVIDIA GPUs and can scale up to multiple GPUs easily to deliver the highest training performance possible.

NeMo is used to build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants, and automated call center support across healthcare, finance, retail, and telecommunications.

Simple Python APIs

Build models with readily available modules in NeMo by chaining them together.


Speed up model training by enabling mixed-precision, multi-GPU, and multi-node compute.

Deploy in Production Easily

Apply NVIDIA® TensorRT™ optimizations for inference and export to NVIDIA Jarvis with a single command.

Easily Compose New Model Architectures

NeMo includes modules to build state-of-the-art models such as Quartznet, Jasper, BERT, Tacotron2, and WaveGlow. With simple API’s, you can import datasets from popular frameworks for Kaldi and deep learning-based models.

You can use neural types to easily catch semantic and dimensionality errors during compilation of these models.

Figure 1: ASR pipeline using NeMo modules

Accelerated Training and Fine-Tuning with GPUs

Figure 2: Multi-GPU, multi-node training

NeMo uses mixed precision on Tensor Cores to speed-up training 4.5X on a single GPU versus FP32 precision. You can further scale training to multi-GPU systems and multi-node clusters with a single parameter.

NGC™ includes hundreds of pre-trained state-of-the-art models that were trained for over 100,000 hours on NVIDIA DGX™ across open and proprietary datasets. You can fine tune these models or modify them with NeMo before training for your use case.

Deploy in Real-Time Services

NeMo models can easily be exported to NVIDIA Jarvis services for high-performance inference with a single command. You can export models in ONNX, PyTorch and TorchScript.

Jarvis applies powerful TensorRT optimizations and sets up the service so you can access these models through a standard API.

Figure 3: NeMo to Jarvis integration


NeMo is available to download from NGC. You can also download with pip install command and Docker container from NeMo GitHub repository

Download Now