NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance.

Download Now Introductory Resources


State-of-the-Art AI

Built on a decade of AI innovations by NVIDIA across hardware, model architectures, training techniques, inference optimizations, and deployment solutions.

Fully Customizable

Flexibility at every step, from modifying model architectures to fine-tuning models on your data and customizing pipelines, as well as the ability to deploy on any platform.

Leading Performance

Continued optimizations across the entire stack from models to software to hardware delivered 12X the gain versus the previous generation.

World-Class AI Speech

As speech-based applications expand globally, they need to be able to process industry-specific jargon in order to listen and respond to humans more naturally—all in real time. Rising to the challenge, Riva includes world-class automatic speech recognition (ASR) that can be customized across domains, as well as controllable text-to-speech that makes applications more expressive.

Try Riva Speech Recognition

In this demo, you’ll see Riva speech recognition deliver highly accurate transcription in real-time.

You can provide an input through your microphone or upload a .wav file from your device.

The duration of each sample is limited to 30 seconds.

Your use of this feature is subject to our Terms of Use. Your data will be used to improve NVIDIA products and services.

Domain-Specific Speech Recognition

Controllable Text-to-Speech

What is NVIDIA Riva?

Simple End-to-End Workflow for Speech

Riva offers pre-trained speech models in NVIDIA NGC™ that can be fine-tuned with the TAO Toolkit on a custom data set, accelerating the development of domain-specific models by 10X.

TAO models can be easily exported, optimized, and deployed as a speech service on premises or in the cloud with a single command using Helm charts.

Riva’s high performance inference is powered by NVIDIA TensorRT™ optimizations and served using the NVIDIA Triton™ Inference Server.

Riva services are available as gRPC-based microservices for low-latency streaming, as well as high-throughput offline use cases.

Riva is fully containerized and can easily scale to hundreds and thousands of parallel streams.


Figure 1: Train and deploy an end-to-end speech AI pipeline using pretrained models, the TAO Toolkit, and Riva.

Speech Recognition

Figure 2: Speech recognition pipeline

Riva offers out-of-the-box world-class speech recognition that can be customized for any domain or deployment platform.

The service handles hundreds to thousands of audio streams as input and returns streaming transcripts with minimal latency.

Riva pipelines are trained on a variety of domain-specific data and can further be tuned for different languages, accents, domains, vocabulary and context.

The end-to-end pipeline is GPU-optimized and includes feature extraction, decoder, punctuation, acoustic and language models that can be customized.

Key Features Include:

  • Multiple model architectures for different deployment environments
  • Models trained for hundreds of thousands of hours on NVIDIA DGX
  • Automatic punctuation
  • Word-level timestamps
  • Inverse text normalization to improve readability of output
  • TensorRT optimizations to minimize latency and maximize throughput
  • Optimized for A100, V100 and T4 GPUs


Riva offers human-like text-to-speech neural voices that use state-of-the-art spectrogram generation and vocoder models. Riva pipelines are customizable and optimized to run efficiently in real-time on GPUs.

Riva TTS takes raw text as input and can return audio chunks as soon as they are generated in streaming mode, or at the end of the entire sequence in batch mode.

The Riva custom voice feature makes it possible for any enterprise to create a unique voice for their brand, virtual assistant, or call center with only 30 mins of data.

Creating a new voice with Riva requires less than one day of training on an A100 GPU versus weeks with alternative technologies.

Key Features:

  • SOTA models generate expressive neural voices
  • Robust pipeline makes it possible to fine tune voice and accent easily
  • Fine grained control on pitch and duration during inference
  • 6x higher inference performance versus existing technologies
  • TensorRT optimizations to minimize latency and maximize throughput
  • Support for A100, V100 and T4 GPUs

Figure 3: Text-to-speech pipeline

Riva Enterprise

For large-scale deployment and full-service support, NVIDIA offers Riva Enterprise.

Learn More

Leading Adopters



Get Started with NVIDIA Riva

Understand the key features in Riva that help you build speech AI services.

Read Blog

Fine-Tune Models with TAO Toolkit

Learn to fine-tune and achieve state-of-the-art models on your data to understand domain-specific jargon.

Learn More

Build Conversational AI Applications

Develop your first conversational AI application that minimizes latency and maximizes throughput on GPUs.

Watch More

NVIDIA Riva is available from the NVIDIA NGC catalog for members of the NVIDIA Developer Program.

Get Started