NVIDIA NeMo

NVIDIA NeMo™ is an open-source framework for developers to build and train state-of-the-art conversational AI models.


Download Now



What Is NVIDIA NeMo?


NVIDIA NeMo is a framework for building, training, and fine-tuning GPU-accelerated speech and natural language understanding (NLU) models with a simple Python interface. Using NeMo, developers can create new model architectures and train them using mixed- precision compute on Tensor Cores in NVIDIA GPUs through easy-to-use application programming interfaces (APIs).

NeMo Megatron is a part of the framework that provides parallelization technologies such as pipeline and tensor parallelism from the Megatron-LM research project for training large-scale language models.

With NeMo, you can build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants, and automated call center support across healthcare, finance, retail, and telecommunications.



Benefits

Rapid Model Building

Configure, build, and train models quickly with simple Python APIs.

Customizable Models

Download and customize pre-trained state-of-the-art models from the NVIDIA NGC™ catalog.

Widely Integrated


Interoperate NeMo with PyTorch and the PyTorch Lightning ecosystem.

Easy Deployment


Apply NVIDIA® TensorRT™ optimizations and export to NVIDIA Riva for high-performance inference




Ping An addresses millions of queries from customers each day using chatbot agents. As an early partner of the Riva early access program, we were able to use the tools and build better solutions with higher accuracy and lower latency, thus providing better services. More specifically, with NeMo, the pre-trained model, and the ASR pipeline optimized using Riva, the system achieved 5 percent improvement on accuracy, so as to serve our customers with better experience.


— Dr. Jing Xiao, Chief Scientist at Ping An

ping am

In our evaluation of Riva for virtual assistants and speech analytics, we saw remarkable accuracy by fine-tuning the automated speech recognition models in the Russian language using the NeMo toolkit in Riva. Riva can provide up to 10X throughput performance with powerful TensorRT optimizations on models, so we’re looking forward to using Riva to get the most out of these technology advancements.


— Nikita Semenov, Head of Machine Learning at MTS AI

mts

InstaDeep delivers decision-making AI products and solutions for enterprises. For this project, our goal is to build a virtual assistant in the Arabic language, and NVIDIA Riva played a significant role in improving the application’s performance. Using the NeMo toolkit in Riva, we were able to fine-tune an Arabic speech-to-text model to get a word error rate as low as 7.84 percent and reduced the training time of the model from days to hours using GPUs. We look forward to integrating these models in Riva’s end-to-end pipeline to ensure real-time latency.

— Karim Beguir, CEO and Co-founder at InstaDeep

instadeep

Through the NVIDIA Riva early access program, we’ve been able to power our conversational AI products with state-of-the-art models using NVIDIA NeMo, significantly reducing the cost of getting started. Riva speech recognition has amazingly low latency and high accuracy. Having the flexibility to deploy on-prem and offer a range of data privacy and security options to our customers has helped us position our conversational AI-enabled products in new industry verticals.

— Rajesh Jha, CEO of Siminsights

siminsights

At MeetKai, we build virtual assistants that make people's lives easier. When we started our company, we faced engineering and production challenges because there weren’t many high-quality, open-source conversational AI toolkits. NVIDIA NeMo helped our engineering efforts by providing easy-to-use APIs and reducing our costs by 25 percent. We look forward to continuing to work with NeMo to create the ultimate AI helper.

— James Kalpan, CEO of MeetKai

meetkai

Kensho leverages S&P Global's world-class data and research to build amazing tools that help people make fact-based decisions. Using NVIDIA NeMo on GPUs, Kensho successfully transcribed tens of thousands of earnings calls, management presentations, and acquisition calls, unlocking double-digit accuracy improvements and enabling S&P Global to increase earnings call coverage by more than 25 percent.


— Keenan Freyberg, Product Manager at Kensho

kensho

Our goal with SpeechBrain at MILA is to build an all-in-one toolkit that can significantly speed up research and development for speech models. We’re interested in pushing the boundaries of speech technologies even further by integrating with NeMo modules, particularly speech recognition and language modeling.



— Mirco Ravanelli, Speech and Deep Learning Scientist at MILA

MILA


NeMo Overview


Easily Compose New Model Architectures


NeMo includes domain-specific collections for ASR, NLP and TTS to develop state-of-the-art models such as Citrinet, Jasper, BERT, Fastpitch, and HiFiGAN. A NeMo model is composed of neural modules, which are the building blocks of models. The inputs and outputs of these modules are strongly typed with neural types that can automatically perform the semantic checks between the modules.

NeMo is designed to offer high flexibility and you can use the Hydra framework to modify the behavior of models easily. For instance, you can modify the architecture of the Citrinet Encoder module in the following diagram using Hydra.

Nemo New Model Architectures
Figure 1: ASR pipeline using NeMo modules


Train State-of-the-Art Conversational AI Models



Several NeMo pre-trained, state-of-the-art models in NGC are trained for over 100,000 hours on NVIDIA DGX™ across open-source, free datasets. You can fine-tune these models or modify them with NeMo before training for your use case.

NeMo uses mixed precision on Tensor Cores to speed up training upto 4.5X on a single GPU versus FP32 precision. You can further scale training to multi-GPU systems and multi-node clusters.

Figure 2: Highly accurate pre-trained models


Large-Scale Language Modeling with NeMo Megatron


Figure 3: Training large-scale language models with NVIDIA NeMo Megatron

Large-scale, Transformer-based language models are being applied to a broad set of natural language tasks such as text generation, summarization, and chatbots. NeMo Megatron provides capabilities to curate training data and train large-scale models with up to trillions of parameters. It performs data curation tasks such as formatting, filtering, deduplication, and blending that can otherwise take months. It also uses tensor parallelism to scale models within nodes, and data and pipeline parallelism to scale data and models across nodes.

NeMo Megatron delivers high training efficiency across thousands of GPUs and makes it practical for enterprises to train large-scale NLP.

NeMo Megatron can export trained models to the NVIDIA Triton™ Inference Server to run large-scale NLP models on multiple GPUs and multiple nodes.


Apply for Early Access


Flexible, Open-Source, Rapidly Expanding Ecosystem


NeMo is built on top of PyTorch and PyTorch Lightning, providing an easy path for researchers to develop and integrate with modules with which they are already comfortable. PyTorch and PyTorch lightning are open-source python libraries that provide modules to compose models.

To provide the researcher's flexibility to customize the models/modules easily, NeMo integrated with the Hydra framework. Hydra is a popular framework that simplifies the development of complex conversational AI models.

NeMo is available as an open-source so that researchers can contribute to and build on it.

Figure 4: NeMo Integration with PyTorch and PyTorch Lightning


Deploy to Production


Figure 5: NeMo to Riva deployment

To deploy NeMo speech models in production with NVIDIA Riva, developers should export NeMo models to a format compatible with Riva and then execute Riva build and deploy commands for creating an optimized skill that can run in real-time.

The documentation includes detailed instructions for exporting and deploying NeMo models to Riva.



Popular Frameworks


NeMo is built on top of the popular PyTorch framework and facilitates researchers to use the NeMo modules with PyTorch applications.


Learn More

NeMo with Pytorch Lightning enables easy and performant multi-GPU/multi-node mixed-precision training.


Learn More

Hydra is a flexible solution that allows researchers to configure NeMo modules and models quickly from a config file and command line.


Learn More



Data Generation and Data Annotation Partners

NVIDIA NeMo provides the capability to train and fine tune state-of-the-art models built using it. Fine-tuning models requires high quality labeled data, which might not be readily available. NeMo is integrated with several easy-to-use speech and language data labeling tools to help acquire labeled data as well as label custom data.

Learn more

Provides off-the-shelf training data in multiple languages and domains.


Learn more

Generates high-quality labels and delivers accurate results in production.


Learn more

Sets the standard in data labeling and extracts valuable insights from raw data.


Learn More


Leading Adopters

 
speechbrain
 
 
 

Resources


Get Started with Tutorials

Check out tutorials to get up and running quickly with state-of-the-art speech and language models.

Learn more

Take a NeMo Tour

Understand the advantages of using NVIDIA NeMo with a Jupyter Notebook walkthrough.

Read Blog

Explore More Conversational AI Blogs

Keep yourself up to date by learning what's new and upcoming in conversational AI.

Explore Blogs

NeMo is available to download from NGC. You can also download with pip install command and Docker container from NVIDIA NeMo GitHub repository

Download Now