Conversational AI

Announcing Megatron for Training Trillion Parameter Models and NVIDIA Riva Availability

Conversational AI is opening new ways for enterprises to interact with customers in every industry using applications like real-time transcription, translation, chatbots, and virtual assistants. Building domain-specific interactive applications requires state-of-the-art models, optimizations for real-time performance, and tools to adapt those models with your data.

This week at GTC, NVIDIA announced several major breakthroughs in conversational AI that will bring in a new wave of conversational AI applications.


NVIDIA Megatron is a PyTorch-based framework for training giant language models based on the transformer architecture. Larger language models are helping produce superhuman-like responses and are being used in applications such as email phrase completion, document summarization, and live sports commentary. 

The Megatron framework has also been harnessed by the University of Florida to develop GatorTron, the world’s largest clinical language model.

Highlights include the following:

  • Linearly scale training up to 1T parameters on DGX SuperPOD with advanced optimizations and parallelization algorithms. 
  • Built on cuBLAS, NCCL, NVLink, and InfiniBand to train a language model on multi-GPU, multi-node systems
  • Improvement in throughput by more than 100x when moving from a 1B parameter model on 32 A100 GPUs to 1T parameters on 3072 A100 GPUs
  • Achieve sustained 50% utilization of Tensor Cores.

For more information, see Scaling Language Model Training to a Trillion Parameters Using Megatron.

Megatron is available on GitHub.


NVIDIA also announced new achievements for Riva, a fully accelerated conversational AI framework, including highly accurate automatic speech recognition, real-time translation for multiple languages, and text-to-speech capabilities to create expressive conversational AI agents.

Highlights include the following:

  • Out-of-the-box speech recognition model trained on multiple large corpus with greater than 90% accuracy
  • Transfer Learning Toolkit in TAO to finetune models on any domain
  • Real-time translation for five languages that run under 100 ms latency per sentence
  • Expressive text-to-speech that delivers 30x higher throughput compared with Tacotron2

These new capabilities will be available in Q2 2021 as part of the ongoing beta program.

The Riva Beta currently includes state-of-the-art models pretrained for thousands of hours on NVIDIA DGX; Transfer Learning Toolkit for adapting those models to your domain with zero coding; Optimized end-to-end speech, vision, and language pipelines that run in real-time.

To get started with Riva, read this introductory post on building and deploying custom conversational AI models using Riva and NVIDIA Transfer Learning Toolkit. For more information, see Building and Deploying Conversational AI Models Using NVIDIA TAO Toolkit.

Next, try these sample applications for ideas on what you can build with Riva out-of-the-box:

  • Riva Rasa assistant: End-to-end voice-enabled AI assistant demonstrating the integration of Riva Speech and Rasa.
  • Riva Contact App: Peer-to-peer video chat with streaming transcription and named entity recognition.
  • Question Answering: Build a QA system with a few lines of Python code using the read-to-use Riva NLP service.

For more information, join us at NVIDIA GTC for free on April 13 for our session,

Discuss (0)