NVIDIA Releases Riva 1.0 Beta for Building Real-Time Conversational AI Services

Today, NVIDIA released the Riva 1.0 Beta which includes an end-to-end workflow for building and deploying real-time conversational AI apps, such as transcription, virtual assistants and chatbots. Riva is an accelerated SDK for multimodal conversational AI services that delivers real-time performance on NVIDIA GPUs.

This release of Riva includes new pretrained models for conversation AI and support for the NVIDIA Transfer Learning Toolkit (TLT) so enterprises can easily adapt apps to their specific use case and domain. These apps are able to understand context and nuance offering a better experience to users.

With Riva, enterprises get state-of-the-art models, ~10x speedup in development time using transfer learning with TLT, and fully optimized and GPU-accelerated pipelines for creating intelligent language-based applications that can run in real time.

Highlights from this version include:

ASR, NLU, and TTS models trained on thousands of hours of speech data.
TLT with zero coding approach to quickly re-train models on custom data.
Fully accelerated deep learning pipelines optimized to run as scalable services.
End-to-end workflow and tools to deploy services using one line of code.

Conversational AI is opening new opportunities in every industry, from finance and healthcare to consumer services.

Early adopters of Riva include InstaDeep, a company creating virtual assistants in the Arabic language. NVIDIA Riva played a significant role in improving their application’s performance. Using the NeMo toolkit in Riva, they were able to fine-tune an Arabic speech-to-text model to get a Word Error Rate as low as 7.84%.

One of the largest mobile network operators in Russia, MTS, is working with Riva for chatbots and virtual assistants for customer support. With Riva, they saw remarkable accuracy by fine-tuning the ASR models in the Russian language and higher throughout performance with TensorRT optimizations.

Ribbon is leveraging Riva in their real-time communications and call processing platform to do advanced AI text-to-speech. Business and government organizations record tens of millions of calls every day, but it’s nearly impossible to search them to pull out important insights. Through Riva, recordings can now be turned into text so that AI tools can quickly search and analyze this data.

In the area of healthcare, Northwestern Medicine is working with Artisight to make hospitals smarter.

“At Northwestern Medicine, we aim to improve patient satisfaction and staff productivity with our suite of healthcare AI solutions,” said Andrew Gostine, MD, MBA, CEO of Artisight. “Conversational AI, powered by NVIDIA Clara Guardian and Riva, improves patient and staff safety during COVID-19 by reducing direct physical contact while delivering high-quality care. Riva ASR and TTS models make this conversational AI a reality. Patients now no longer need to wait for the clinical staff to become available, they can receive immediate answers from an AI-powered virtual assistant.”

Meanwhile Intelligent Voice, which has a system that uses speech recognition technology to capture calls, convert them into text and automatically send transcripts, saw great results with Riva.

“At Intelligent Voice, we provide high performance speech recognition solutions, but our customers are always looking for more,” said Nigel Cannings, CTO at Intelligent Voice. “Riva takes a multi-modal approach that fuses key elements of Automatic Speech Recognition with entity and intent matching to address new use cases where high-throughput and low latency are required. The Riva API is very easy to use, integrate and customize to our customers’ workflows for optimized performance.”

NVIDIA Riva and Transfer Learning Toolkit are available freely for download to members of the NVIDIA developer program today. On the ‘Getting Started’ page, you’ll find several resources such as samples, Jupyter notebooks, and tutorial posts for new users.