NVIDIA Riva is a GPU-accelerated SDK for building multimodal conversational AI applications that deliver real-time performance on GPUs.

Download Now Introductory Resources

What is NVIDIA Riva?

Riva is a fully accelerated SDK for building multimodal conversational AI applications that use an end-to-end deep learning pipeline. Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7X higher throughput on GPUs compared with CPUs.

The Riva SDK includes pre-trained conversational AI models, the NVIDIA TAO Toolkit, and optimized end-to-end skills for speech, vision, and natural language processing (NLP) tasks.

Fusing vision, audio, and other sensor inputs simultaneously provides capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.

Riva-based applications have been optimized to maximize performance on the NVIDIA EGX™ platform in the cloud, in the data center, and at the edge.


High Accuracy

Customize state-of-the-art pretrained models that have been trained on industry-specific jargon for over 100,000 hours on NVIDIA DGX™ on industry-specific jargon.

Real-Time Performance

Run end-to-end deep learning-based conversational AI applications in under 300 milliseconds (ms), the latency threshold for real-time performance.

Automated Deployment

Use one command to deploy conversational AI services in the cloud or at the data center.

State-of-the-Art Interactive Conversational AI

As conversational AI applications expand globally, they need to understand industry-specific jargon to translate and interact with humans more naturally—all in real time. Riva includes world-class automatic speech recognition (ASR) that can be customized across domains, translation to multiple languages, and controllable text-to-speech (TTS) that make the applications more expressive.

Try Riva Speech Recognition in Action

In this demo, you’ll see Riva speech recognition deliver highly accurate transcription in real-time.

You can provide an input through your microphone or upload a .wav file from your device.

The duration of each sample is limited to 30 seconds.

Your use of this feature is subject to these Terms of Use. Your data will be used to improve NVIDIA’s products and services.

World Class Speech Recognition

Real-Time Machine Translation

Controllable Text-to-Speech    

Riva SDK Overview

Customize for Your Domain with TAO Toolkit

TAO Toolkit offers a zero coding approach to fine-tune pre trained deep learning models, accelerating model development time up to 10X versus training from scratch. Developers and machine learning practitioners use TAO Toolkit to maximize accuracy for their domain-specific applications by training on their custom data before deploying to Riva for inference in production.

Pre-trained models and TAO Toolkit are available in the NVIDIA NGC™ catalog.

Figure 1: Train and deploy an end-to-end conversational AI pipeline using pretrained models, TAO Toolkit and Riva.

Optimize Task-Specific Skills

Figure 2: Riva AI skills.

Access high-performance skills for tasks such as speech recognition, intent recognition, speech synthesis, pose estimation, gaze detection, and facial landmark detection through a simple API.

Pipelines for each skill can be fused to build new skills. Each pipeline is performance-tuned to deliver the highest performance possible and can be customized for your specific use case.

Build and Deploy Skills Easily

Automate the steps that go from pre trained models to optimized skills deployed in the cloud and in the data center. Under the hood, Riva applies powerful NVIDIA TensorRT™ optimizations to models, configures the NVIDIA Triton™ Inference Server, and exposes the models as a service through a standard API.

To deploy, you can use a single command to download, set up, and run the entire Riva application or individual services through Helm charts on Kubernetes clusters. The Helm charts can be customized for your use case and are available in NGC.

Figure 3: Helm command to deploy models to production.

Develop New Multimodal Skills

Figure 4: Multimodal application with multiple users and contexts.

Build multimodal skills such as multi-speaker transcription, chatbots, gesture recognition, and look-to-talk for your conversational AI applications.

With Riva, you can build multimodal pilot apps by fusing speech, language understanding, and vision pipelines along with a dialog manager that supports multi-user and multi-context.

Leading Adopters Across All Verticals

T-Mobile uses Riva to deliver exceptional experience to their customers.

Learn more

Ribbon Communications analyzes contact center calls using Riva to make smarter business decisions.

Learn more

NTT Resonant used Riva to build a restaurant reservation system.

Learn more

InstaDeep developed an Arabic virtual assistant with the help of NVIDIA Riva.

Learn more


Get Started with NVIDIA Riva

Understand the key features in Riva that help you build multimodal conversational AI services.

Read Blog

Fine-Tune Models with TAO Toolkit

Learn to fine-tune and achieve state-of-the-art models on your data to understand domain-specific jargon.

Learn More

Build Conversational AI Applications

Develop your first conversational AI application that minimizes latency and maximizes throughput on GPUs.

Watch More

NVIDIA Riva is available from the NVIDIA NGC catalog for members of the NVIDIA Developer Program.

Get Started