NVIDIA Riva is a GPU-accelerated SDK for building multimodal conversational AI applications that deliver real-time performance on GPUs.

Download Now    Introductory Resources

Riva is a fully accelerated SDK for building multimodal conversational AI applications that use an end-to-end deep learning pipeline. Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7X higher throughput on GPUs compared with CPUs.

The Riva SDK includes pre-trained conversational AI models, the NVIDIA TAO Toolkit, and optimized end-to-end skills for speech, vision, and natural language processing (NLP) tasks.

Fusing vision, audio, and other sensor inputs simultaneously provides capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.

Riva-based applications have been optimized to maximize performance on the NVIDIA EGX™ platform in the cloud, in the data center, and at the edge.

High Accuracy

Customize state-of-the-art pretrained models that have been trained on industry-specific jargon for over 100,000 hours on NVIDIA DGX™ on industry-specific jargon.

Real-Time Performance

Run end-to-end deep learning-based conversational AI applications in under 300 milliseconds (ms), the latency threshold for real-time performance.

Automated Deployment

Use one command to deploy conversational AI services in the cloud or at the data center.

State-of-the-Art Interactive Conversational AI

As conversational AI applications expand globally, they need to understand industry-specific jargon to translate and interact with humans more naturally—all in real time. Riva includes world-class automatic speech recognition (ASR) that can be customized across domains, translation to multiple languages, and controllable text-to-speech (TTS) that make the applications more expressive.

World Class Speech Recognition

Real-Time Machine Translation

Controllable Text-to-Speech

Customize for Your Domain with TAO Toolkit

TAO Toolkit offers a zero coding approach to fine-tune pre trained deep learning models, accelerating model development time up to 10X versus training from scratch. Developers and machine learning practitioners use TAO Toolkit to maximize accuracy for their domain-specific applications by training on their custom data before deploying to Riva for inference in production.

Pre-trained models and TAO Toolkit are available in the NVIDIA NGC™ catalog.

Figure 1: Train and deploy an end-to-end conversational AI pipeline using pretrained models, TAO Toolkit and Riva.

Develop New Multimodal Skills

Figure 2: Multimodal application with multiple users and contexts.

Build multimodal skills such as multi-speaker transcription, chatbots, gesture recognition, and look-to-talk for your conversational AI applications.

With Riva, you can build multimodal pilot apps by fusing speech, language understanding, and vision pipelines along with a dialog manager that supports multi-user and multi-context.

Optimize Task-Specific Skills

Access high-performance skills for tasks such as speech recognition, intent recognition, speech synthesis, pose estimation, gaze detection, and facial landmark detection through a simple API.

Pipelines for each skill can be fused to build new skills. Each pipeline is performance-tuned to deliver the highest performance possible and can be customized for your specific use case.

Figure 3: Riva AI skills.

Build and Deploy Skills Easily

Figure 4: Helm command to deploy models to production.

Automate the steps that go from pre trained models to optimized skills deployed in the cloud and in the data center. Under the hood, Riva applies powerful NVIDIA TensorRT™ optimizations to models, configures the NVIDIA Triton™ Inference Server, and exposes the models as a service through a standard API.

To deploy, you can use a single command to download, set up, and run the entire Riva application or individual services through Helm charts on Kubernetes clusters. The Helm charts can be customized for your use case and are available in NGC.

Leading Adopters Across All Verticals

T-Mobile uses Riva to deliver exceptional experience to their customers.

Learn more >

Ribbon Communications analyzes contact center calls using Riva to make smarter business decisions.

Learn more >

NTT Resonant used Riva to build a restaurant reservation system.

Learn more >

InstaDeep developed an Arabic virtual assistant with the help of NVIDIA Riva.

Learn more >


Get Started with NVIDIA

Understand the key features in Riva that help you build multimodal conversational AI services.

Read blog >

Fine-Tune Models with TAO Toolkit

Learn to fine-tune and achieve state-of-the-art models on your data to understand domain-specific jargon.

Learn more >

Build Conversational AI Applications

Develop your first conversational AI application that minimizes latency and maximizes throughput on GPUs.

Watch now >

NVIDIA Riva is available from the NVIDIA NGC catalog for members of the NVIDIA Developer Program

Get Started