Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit.  Register Free

NVIDIA Riva

NVIDIA® Riva is a GPU-accelerated speech AI SDK for building and deploying fully customizable, real-time AI pipelines that deliver world-class accuracy in all clouds, on-premises, at the edge and on embedded devices.

Download now Introductory resources

Explore NVIDIA Riva benefits.

Built on State-of-the-Art NVIDIA AI

Riva is part of the NVIDIA AI platform, which has been built on a decade of AI innovations by NVIDIA across model architectures, training techniques, inference optimizations, and deployment solutions.

Fully Customizable

Flexibility at every step, from modifying model architectures to fine-tuning models on your data and customizing pipelines, as well as the ability to deploy on any platform.

Leading Performance

Continued optimizations across the entire stack from models to software to hardware delivered 12X the gain versus the previous generation.


World-Class Speech AI

As speech-based applications are adopted globally, solutions need to interact with humans across many languages. Speech AI apps need to understand industry specific jargon and respond naturally in real-time. Riva includes world-class automatic speech recognition (ASR) and text-to-speech (TTS) that runs in real time.

Try NVIDIA Riva automatic speech recognition.


In this demo, Riva ASR delivers highly accurate transcription in real time.

You can provide an input through your microphone or upload a .wav file from your device.

The duration of each sample is limited to 30 seconds.

Try saying something

Try NVIDIA Riva Text-to-Speech.

If you’re looking to add voice to your interactive virtual assistant, modern home device, or reading assistant for people with a reading disability or visual impairment, try Riva’s out-of-the-box (OOTB) English female or male voice.

Hear the human-like expressive voices created using Riva’s state-of-the-art (SOTA) neural speech synthesis models.

0 / 400

Your use of Riva Voice Recognition and Riva Text-to-Speech is subject to our Terms of Use. Your data will be used to improve NVIDIA products and services.

Domain-Specific Automatic
Speech Recognition

Controllable
Text-to-Speech




What Is NVIDIA Riva?

Simple end-to-end workflow for speech AI applications.

Riva offers:

  • Pretrained speech AI SOTA models: ASR and TTS models fully customizable for datasets and accelerating the development of domain-specific models by 10X.
  • High-performance inference: Inference is powered by NVIDIA TensorRT™ optimizations and served using the NVIDIA Triton™ Inference Server, both components of the NVIDIA AI platform.
  • Riva services: These are available as gRPC-based microservices for low-latency streaming and high-throughput offline use cases.
  • High scalability: Fully containerized, Riva can easily scale to hundreds and thousands of parallel streams.
Image showing end-to-end speech AI pipeline

Figure 1: Train and deploy an end-to-end speech AI pipeline using Riva.

Learn more about Riva ASR.

Image showing automatic speech recognition pipeline
Figure 2: Automatic speech recognition pipeline

Speech recognition technology enables voice search on the internet, hands-free computing, voice commands to smart home devices and in-car assistants, medical note taking, contact center 24/7 virtual assistants, and phone call and video conferencing transcriptions for pattern and trends analytics. NVIDIA Riva automatic speech recognition (ASR) delivers world-class, accurate transcripts based on GPU-optimized models, fully customizable for any domain or deployment platform.

Key features of Riva ASR include:

  • Support for English, Spanish, Mandarin, Hindi, Russian, German, and French
  • Out-of-the-box models trained on a variety of domain-specific data for hundreds of thousands of hours on NVIDIA GPUs
  • Best-possible accuracy for different languages, accents, domains, vocabulary, and context by fine-tuning vocabulary, lexicon, acoustic, language, punctuation and inverse text normalization models
  • The ability to return streaming transcripts with automatic punctuation and world-level timestamps for hundreds of thousands of input audio streams
  • Word/Profanity filtering with customizable and effective offensive spoken words removal

Learn more about Riva TTS.

Text-to-speech produces voices that narrate e-books and documents, converse with humans as smart assistants or digital avatars, and are part of nearly all digital devices, including smartphones, tablets, and laptops. NVIDIA Riva text-to-speech (TTS) provides human-like synthetic voices based on state-of-the-art spectrogram generation and vocoder models. TTS pipelines are customizable and GPU-optimized to run efficiently in real time.

Key features of Riva TTS include:

  • SOTA models for generating expressive, human-like voices
  • Two out-of-the-box professional female and male voices for US English
  • Easy voice and accent fine-tuning with pitch, volume, and duration control for expressivity
  • 12X higher inference performance versus existing technologies
Image showing text-to-speech pipeline
Figure 3: Text-to-speech pipeline

Fast-Track Your Riva Journey with NVIDIA LaunchPad

Get immediate access to NVIDIA Riva with free curated labs. Access step-by-step guided labs for speech AI with ready-to-use software, sample data, and applications.


Apply Now

Read customer stories.

RingCentral video meeting

With NVIDIA Riva, RingCentral achieved unparalleled real-time transcription accuracy for video meetings, serving millions of users with diverse accents & domain-specific jargon globally.

Learn more
T-Mobile call center

T-Mobile uses NVIDIA Riva ASR in their call center to accurately transcribe customer conversations and provide real-time recommendations to agents for quickly resolving customer queries.


Learn more
Tateel's real-time feedback on Quran recitation
Tarteel AI logo

Tarteel uses NVIDIA Riva and NVIDIA NeMo to provide real-time feedback on Quran recitation at scale, enabling Muslims, instructors, content creators, and researchers to engage with the Quran.


Learn more
Students using Plabook app in class
Data Monsters logo

Data Monsters added a speech pipeline for Plabook app using NVIDIA Riva to help students read, assess the accuracy at phoneme-level and provide individualized feedback.

Learn more
Call center associate helping clients globally
Floatbot logo

Floatbot leverages NVIDIA Riva and NVIDIA TAO for their customized Singaporean English voice AI applications, automating call centers for insurance carriers and finance clients globally.

Learn more

Explore more resources.


Get an introduction.

Understand the key features in Riva that help you build speech AI services.

Read blog

Explore the starter kit.

Get everything you need to start developing your speech AI with NVIDIA Riva, including tutorials, Jupyter Notebooks, and documentation.

Get started

Watch a webinar.

Learn how you can leverage NVIDIA AI to build speech AI applications that deliver world-class accuracy while running in real time across thousands of streams.

Watch now

NVIDIA Riva is available from the NVIDIA GPU Cloud for members of the NVIDIA Developer Program.

Get started