Conversational AI / NLP

Build Speech AI in Multiple Languages and Train Large Language Models with the Latest from Riva and NeMo Framework

Graphical representation of automatic speech recognition for transcription, controllable text-to-speech, and natural language processing in a chatbot.

Mar 28, 2022

By Siddharth Sharma, Gordana Neskovic and Sirisha Rella

Discuss (0)

AI-Generated Summary

Dislike

NVIDIA announced updates to Riva, an SDK for building speech AI applications, and introduced Riva Enterprise, a paid offering for enterprises deploying Riva at scale.
Riva 2.0 offers real-time automatic speech recognition and text-to-speech skills across multiple languages and can be deployed on-prem, in any cloud, or on embedded platforms.
The NVIDIA NeMo framework, used for training large language models, received updates including a hyperparameter tuning tool, reference recipes for T5 and mT5 models, and support to train LLM on cloud platforms like Azure.

AI-generated content may summarize information incompletely. Verify important information. Learn more

Major updates to Riva, an SDK for building speech AI applications, and a paid Riva Enterprise offering were announced at NVIDIA GTC 2022 last week. Several key updates to the NeMo framework, a framework for training Large Language Models, were also announced.

Riva 2.0 general availability

Riva offers world-class accuracy for real-time automatic speech recognition (ASR) and text-to-speech (TTS) skills across multiple languages and can be deployed on-prem, in any cloud. Industry leaders such as Snap, T-Mobile, RingCentral, and Kore.ai use Riva in customer care center applications, transcription, and virtual assistants.

The latest Riva version includes:

ASR in multiple languages: English, Spanish, German, Russian, and Mandarin.
High-quality TTS voices customizable for unique voice fonts.
Domain-specific customization with TAO Toolkit or NVIDIA NeMo for unparalleled accuracy in accent, domain, and country-specific jargon.
Support to run in cloud, on-prem, and on embedded platforms.

Try Riva automatic speech recognition on the Riva product page.

Defined.ai has collaborated with NVIDIA to provide a smooth workflow for enterprises looking to purchase speech training and validation data across languages, domains, and recording types.

Download Riva, which is available free for members of the NVIDIA Developer program from NGC.

Riva Enterprise

NVIDIA also introduced Riva Enterprise, a paid offering for enterprises deploying Riva at scale with business-standard support from NVIDIA experts.

Benefits include:

Unlimited use of ASR and TTS services on any cloud and on-prem platforms.
Access to NVIDIA AI experts during local business hours for guidance on configurations and performance.
Long-term support for maintenance control and upgrade schedule.
Priority access to new releases and features.

Riva Enterprise is available as a free trial on NVIDIA Launchpad for enterprises to evaluate and prototype their applications.

Riva Enterprise on launchpad includes guided labs to:

Interact with Real-Time Speech AI APIs.
Add Speech AI Capabilities to a Conversational AI Application.
Fine-Tune a Speech AI Pipeline on Custom Data for Higher Accuracy.

Apply for your Riva Enterprise trial.

Learn more about how to build, optimize, and deploy speech AI applications from the Conversational AI Demystified GTC session.

NeMo framework

NVIDIA announced new updates to the NVIDIA NeMo framework, a framework for training large language models (LLM) up to trillions of parameters. Built on innovations from the Megatron paper, with the NeMo framework research institutions and enterprises can train any LLM to convergence. The NeMo framework provides data preprocessing, parallelism (data, tensor, and pipeline), orchestration and scheduling, and auto-precision adaptation.

It consists of thoroughly tested recipes, popular LLM architecture implementations, and necessary tools for organizations to quickly start their LLM journey.

AI Sweden, JD.com, Naver, and the University of Florida are early adopters of NVIDIA technologies for building large language models.

The latest version includes:

Hyperparameter tuning tool—automatically creates recipes based on customers’ needs and infrastructure limitations.
Reference recipes for T5 and mT5 models.
Support to train LLM on cloud, starting with Azure.
Distributed data preprocessing scripts to shorten end-to-end training time.

Apply for NeMo framework early access.

Learn more about interesting applications of LLMs and best practices to deploy them in the Natural Language Understanding in Practice: Lessons Learned from Successful Enterprise Deployments GTC session.

Discuss (0)

About the Authors

About Siddharth Sharma
Siddharth Sharma is the director of Developer Marketing at NVIDIA and is focused on AI and Data Science technologies. Before joining NVIDIA, Siddharth was a senior product marketing manager for Simulink and Stateflow at Mathworks, working closely with automotive and aerospace companies to adopt model-based designs for creating control software.

View all posts by Siddharth Sharma

About Gordana Neskovic
Gordana Neskovic is on the AI / DL product marketing team responsible for NVIDIA Maxine. Gordana has held various product marketing, data scientist, AI architect, and engineering roles at VMware, Wells Fargo, Pinterest, SFO-ITT, and KLA-Tencor before joining NVIDIA. She holds a Ph.D. from Santa Clara University and master’s and bachelor's degrees in electrical engineering from the University of Belgrade, Serbia.

View all posts by Gordana Neskovic

About Sirisha Rella
Sirisha Rella is a technical product marketing manager at NVIDIA focused on computer vision, speech, and language-based deep learning applications. Sirisha received her master’s degree in computer science from the University of Missouri-Kansas City and was a graduate research assistant at the National Science Foundation - Center for Big Learning.

View all posts by Sirisha Rella