Join us for GTC as we explore best practices for developing world-class Speech AI pipelines using NVIDIA Riva.     Register now


NVIDIA® Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance.

Download now Introductory resources

Benefits of Riva

Built on State-of-the-Art NVIDIA AI

Riva is part of the NVIDIA AI platform, which has been built on a decade of AI innovations by NVIDIA across model architectures, training techniques, inference optimizations, and deployment solutions.

Fully Customizable

Flexibility at every step, from modifying model architectures to fine-tuning models on your data and customizing pipelines, as well as the ability to deploy on any platform.

Leading Performance

Continued optimizations across the entire stack from models to software to hardware delivered 12X the gain versus the previous generation.

World-Class Speech AI

As speech-based applications are adopted globally, solutions need to interact with humans across many languages. Speech AI apps need to understand industry specific jargon and respond naturally in real-time. Riva includes world-class automatic speech recognition (ASR) and text-to-speech (TTS) that runs in real time.

Try NVIDIA Riva Automatic Speech Recognition

In this demo, you'll see Riva speech recognition deliver highly accurate transcription in real time. You can provide an input through your microphone or upload a .wav file from your device.

The duration of each sample is limited to 30 seconds.

Try saying something

Try NVIDIA Riva Text-to-Speech

If you're looking to add voice to your interactive virtual assistant, modern home device, or reading assistant for the visually impaired or for people with a reading disability, try Riva's out-of-the-box English female or male voice.

Hear the natural-sounding and expressive voices created using Riva's state-of-the-art neural speech synthesis models.

0 / 400

Your use of Riva Voice Recognition and Riva Text-to-Speech is subject to our Terms of Use. Your data will be used to improve NVIDIA products and services.

Domain-Specific Automatic
Speech Recognition


What Is NVIDIA Riva?

Simple End-to-End Workflow for Speech

Riva offers pre-trained speech models in NVIDIA NGC™ that can be fine-tuned with the TAO Toolkit on a custom data set, accelerating the development of domain-specific models by 10X.

TAO models can be easily exported, optimized, and deployed as a speech service on premises or in the cloud with a single command using Helm charts.

Riva’s high performance inference is powered by NVIDIA TensorRT™ optimizations and served using the NVIDIA Triton™ Inference Server, which are both part of the NVIDIA AI platform.

Riva services are available as gRPC-based microservices for low-latency streaming, as well as high-throughput offline use cases.

Riva is fully containerized and can easily scale to hundreds and thousands of parallel streams.

Image showing end-to-end speech AI pipeline

Figure 1: Train and deploy an end-to-end speech AI pipeline using pretrained models, the TAO Toolkit, and Riva.
Image showing automatic speech recognition pipeline
Figure 2: Automatic speech recognition pipeline

Automatic Speech Recognition

Riva offers out-of-the-box world-class automatic speech recognition (ASR) that can be customized for any domain or deployment platform.

The service handles hundreds to thousands of audio streams as input and returns streaming transcripts with minimal latency.

Riva pipelines are trained on a variety of domain-specific data and can further be tuned for different languages, accents, domains, vocabulary and context.

The end-to-end pipeline is GPU-optimized and includes feature extraction, decoder, punctuation, acoustic and language models that can be customized.

Key Features Include:

  • Multiple model architectures for different deployment environments
  • Models trained for hundreds of thousands of hours on NVIDIA DGX
  • Support for English, Spanish, German, and Russian
  • Automatic punctuation
  • Word-level timestamps
  • Inverse text normalization to improve readability of output
  • Benefit from the NVIDIA AI platform by applying NVIDIA TensorRT optimizations to minimize latency and maximize throughput
  • Optimized for A100, V100 and T4 GPUs


Riva offers human-like text-to-speech (TTS) neural voices that use state-of-the-art spectrogram generation and vocoder models. Riva pipelines are customizable and optimized to run efficiently in real-time on GPUs.

Riva TTS takes raw text as input and can return audio chunks as soon as they are generated in streaming mode, or at the end of the entire sequence in batch mode.

The Riva custom voice feature makes it possible for any enterprise to create a unique voice for their brand, virtual assistant, or call center with only 30 mins of data.

Creating a new voice with Riva requires less than one day of training on an A100 GPU versus weeks with alternative technologies.

Key Features:

  • SOTA models generate expressive neural voices
  • Robust pipeline makes it possible to fine tune voice and accent easily
  • Fine-grained control on voice pitch and duration for expressivity
  • 12X higher inference performance versus existing technologies
  • Benefit from the NVIDIA AI platform by applying NVIDIA TensorRT optimizations to minimize latency and maximize throughput
  • Support for NVIDIA A100, V100 and T4 GPUs

Image showing text-to-speech pipeline
Figure 3: Text-to-speech pipeline

Riva Enterprise

For large-scale deployment and full-service support, NVIDIA offers Riva Enterprise.

Learn more

Customer Stories

RingCentral video meeting

With NVIDIA Riva, RingCentral achieved unparalleled real-time transcription accuracy for video meetings, serving millions of users with diverse accents & domain-specific jargon globally.

Learn more
T-Mobile call center

T-Mobile uses NVIDIA Riva ASR in their call center to accurately transcribe customer conversations and provide real-time recommendations to agents for quickly resolving customer queries.

Learn more
Tateel's real-time feedback on Quran recitation
Tarteel AI logo

Tarteel uses NVIDIA Riva and NVIDIA NeMo to provide real-time feedback on Quran recitation at scale, enabling Muslims, instructors, content creators, and researchers to engage with the Quran.

Learn more
Students using Plabook app in class
Data Monsters logo

Data Monsters added a speech pipeline for Plabook app using NVIDIA Riva to help students read, assess the accuracy at phoneme-level and provide individualized feedback.

Learn more
Call center associate helping clients globally
Floatbot logo

Floatbot leverages NVIDIA Riva and NVIDIA TAO for their customized Singaporean English voice AI applications, automating call centers for insurance carriers and finance clients globally.

Learn more


Introductory Blog

Understand the key features in Riva that help you build speech AI services.

Read blog

Starter Kit

Get Everything you need to start developing your speech AI with NVIDIA Riva, including tutorials, jupyter notebook, and documentation.

Get started


Learn how you can leverage NVIDIA AI to build speech AI applications that deliver world-class accuracy while running in real time across thousands of streams.

Watch now

NVIDIA Riva is available from the NVIDIA NGC catalog for members of the NVIDIA Developer Program.

Get started