Major updates to Riva, an SDK for building speech AI applications, and a paid Riva Enterprise offering were announced at NVIDIA GTC 2022 last week. Several key updates to the NeMo framework, a framework for training Large Language Models, were also announced.
Riva 2.0 general availability
Riva offers world-class accuracy for real-time automatic speech recognition (ASR) and text-to-speech (TTS) skills across multiple languages and can be deployed on-prem, in any cloud. Industry leaders such as Snap, T-Mobile, RingCentral, and Kore.ai use Riva in customer care center applications, transcription, and virtual assistants.
The latest Riva version includes:
- ASR in multiple languages: English, Spanish, German, Russian, and Mandarin.
- High-quality TTS voices customizable for unique voice fonts.
- Domain-specific customization with TAO Toolkit or NVIDIA NeMo for unparalleled accuracy in accent, domain, and country-specific jargon.
- Support to run in cloud, on-prem, and on embedded platforms.
Try Riva automatic speech recognition on the Riva product page.
Defined.ai has collaborated with NVIDIA to provide a smooth workflow for enterprises looking to purchase speech training and validation data across languages, domains, and recording types.
Download Riva, which is available free for members of the NVIDIA Developer program from NGC.
NVIDIA also introduced Riva Enterprise, a paid offering for enterprises deploying Riva at scale with business-standard support from NVIDIA experts.
- Unlimited use of ASR and TTS services on any cloud and on-prem platforms.
- Access to NVIDIA AI experts during local business hours for guidance on configurations and performance.
- Long-term support for maintenance control and upgrade schedule.
- Priority access to new releases and features.
Riva Enterprise is available as a free trial on NVIDIA Launchpad for enterprises to evaluate and prototype their applications.
Riva Enterprise on launchpad includes guided labs to:
- Interact with Real-Time Speech AI APIs.
- Add Speech AI Capabilities to a Conversational AI Application.
- Fine-Tune a Speech AI Pipeline on Custom Data for Higher Accuracy.
Apply for your Riva Enterprise trial.
Learn more about how to build, optimize, and deploy speech AI applications from the Conversational AI Demystified GTC session.
NVIDIA announced new updates to the NVIDIA NeMo framework, a framework for training large language models (LLM) up to trillions of parameters. Built on innovations from the Megatron paper, with the NeMo framework research institutions and enterprises can train any LLM to convergence. The NeMo framework provides data preprocessing, parallelism (data, tensor, and pipeline), orchestration and scheduling, and auto-precision adaptation.
It consists of thoroughly tested recipes, popular LLM architecture implementations, and necessary tools for organizations to quickly start their LLM journey.
AI Sweden, JD.com, Naver, and the University of Florida are early adopters of NVIDIA technologies for building large language models.
The latest version includes:
- Hyperparameter tuning tool—automatically creates recipes based on customers’ needs and infrastructure limitations.
- Reference recipes for T5 and mT5 models.
- Support to train LLM on cloud, starting with Azure.
- Distributed data preprocessing scripts to shorten end-to-end training time.
Apply for NeMo framework early access.
Learn more about interesting applications of LLMs and best practices to deploy them in the Natural Language Understanding in Practice: Lessons Learned from Successful Enterprise Deployments GTC session.