Conversational AI / NLP

Announcing Megatron for Training Trillion Parameter Models & NVIDIA Riva Availability

Conversational AI is opening new ways for enterprises to interact with customers in every industry using applications like real-time transcription, translation, chatbots and virtual assistants. Building domain-specific interactive applications requires state-of-the-art models, optimizations for real time performance, and tools to adapt those models with your data. This week at GTC, NVIDIA announced several major breakthroughs in conversational AI that will bring in a new wave of conversational AI applications.


NVIDIA Megatron is a PyTorch-based framework for training giant language models based on the transformer architecture. Larger language models are helping produce superhuman-like responses and are being used in applications such as email phrase completion, document summarization and live sports commentary. The Megatron framework has also been harnessed by the University of Florida to develop GatorTron, the world’s largest clinical language model.

Highlights include:

  • Linearly scale training up to 1 trillion parameters on DGX SuperPOD with advanced optimizations and parallelization algorithms. 
  • Built on cuBLAS, NCCL, NVLINK and InfiniBand to train a language model on multi-GPU, multi-node systems
  • Improvement in throughput by more than 100x when moving from 1 billion parameter model on 32 A100 GPUs to 1T parameter on 3072 A100 GPUs
  • Achieve sustained 50% utilization of Tensor Cores.

Read the technical blog post for more details.
Megatron is available on GitHub.


NVIDIA also announced new achievements for Riva, a fully accelerated conversational AI framework, including highly accurate automatic speech recognition, real-time translation for multiple languages and text-to-speech capabilities to create expressive conversational AI agents.

Highlights include:

  • Out-of-the-box speech recognition model trained on multiple large corpus with greater than 90% accuracy
  • Transfer Learning Toolkit in TAO to finetune models on any domain
  • Real-time translation for 5 languages that run under 100ms latency per sentence
  • Expressive text-to-speech that delivers 30x higher throughput compared with Tacotron2

These new capabilities will be available in Q2 2021 as part of the ongoing beta program.

The Riva Beta currently includes state-of-the-art models pretrained for thousands of hours on NVIDIA DGX; Transfer Learning Toolkit for adapting those models to your domain with zero coding; Optimized end-to-end speech, vision, and language pipelines that run in real-time.

To get started with Riva, read this introductory post on building and deploying custom conversational AI models using Riva and NVIDIA Transfer Learning Toolkit. Read the technical post >

Next, try these sample applications for ideas on what you can build with Riva out-of-the-box:

  • Riva Rasa assistant: End-to-end voice enabled AI assistant demonstrating integration of Riva Speech and Rasa
  • Riva Contact App: Peer-to-peer video chat with streaming transcription and named entity recognition
  • Question Answering: Build a QA system with a few lines of Python code using read-to-use Riva NLP service 

Join us at NVIDIA GTC for free on April 13th for our session Building and Deploying a Custom Conversational AI App with NVIDIA Transfer Learning Toolkit and Riva to learn more.

Discuss (0)