NVIDIA NeMo is an open-source toolkit for developing state-of-the-art conversational AI models.
Building state-of-the-art conversational AI models requires researchers to quickly experiment with novel network architectures. This means going through the complex and time-consuming process of modifying multiple networks and verifying compatibility across inputs, outputs, and data pre-processing layers.
NVIDIA NeMo is a Python toolkit for building, training, and fine-tuning GPU-accelerated conversational AI models using a simple interface. Using NeMo, researchers and developers can build state-of-the-art conversational AI models using easy-to-use application programming interfaces (APIs). NeMo runs mixed precision compute using Tensor Cores in NVIDIA GPUs and can scale up to multiple GPUs easily to deliver the highest training performance possible.
NeMo is used to build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants, and automated call center support across healthcare, finance, retail, and telecommunications.
Rapid Model Building
Configure, build, and train models quickly with simple Python APIs.
Download and customize pre-trained state-of-the-art models from NGC.
Interoperable with PyTorch and PyTorch Lightning ecosystem.
Apply NVIDIA® TensorRT™ optimizations for inference and export to NVIDIA Jarvis with a single command.
Easily Compose New Model Architectures
NeMo includes domain-specific collections for ASR, NLP and TTS to develop state-of-the-art models such as QuartzNet, Jasper, BERT, Tacotron2, and WaveGlow in three lines of code. The NeMo model is composed of Neural Modules, which are the building blocks of models. The inputs and outputs of these modules are strongly typed with Neural types that can automatically perform the semantic checks between the modules.
NeMo is designed to offer high flexibility and you can use the Hydra framework to modify the behavior of models easily. For instance, you can modify the architecture of the Jasper Encoder module in the following diagram using Hydra.
Retrain SOTA Conversational AI Models
Several NeMo pre-trained state-of-the-art models are available in NGC that are trained for over 100,000 hours on NVIDIA DGX™ across open and proprietary datasets. You can fine tune these models or modify them with NeMo before training for your use case.
NeMo uses mixed precision on Tensor Cores to speed-up training upto 4.5X on a single GPU versus FP32 precision. You can further scale training to multi-GPU systems and multi-node clusters.
Flexible, Open-Source, Rapidly Expanding Ecosystem
NeMo is built on top of PyTorch and PyTorch Lightning, providing an easy path for researchers to develop and integrate with modules with which they are already comfortable. PyTorch and PyTorch lightning are open-source python libraries that provide modules to compose models
To provide the researcher's flexibility to customize the models/modules easily, NeMo integrated with the Hydra framework. Hydra is a popular framework that simplifies the development of complex conversational AI models.
NeMo is available as an open-source so that researchers can contribute to and build on it.
Deploy in Real-Time Services
- NVIDIA NeMo (Sample Code)
- Introduction to NeMo for Fast Development of Speech and Language Models (Blog)
- Introducing Jarvis: Framework for GPU-Accelerated Conversational AI Services (Blog)
- NVIDIA NeMo Developer Blogs (Blog)
- Training and Deploying Conversational AI Applications with NeMo and Jarvis (Webinar)