Riva is a fully accelerated SDK for building multimodal conversational AI applications that use an end-to-end deep learning pipeline. Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7X higher throughput on GPUs compared with CPUs.
The Riva SDK includes pre-trained conversational AI models, the NVIDIA Transfer Learning Toolkit, and optimized end-to-end skills for speech, vision, and natural language processing (NLP) tasks.
Fusing vision, audio, and other sensor inputs simultaneously provides capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.
Riva-based applications have been optimized to maximize performance on the NVIDIA EGX™ platform in the cloud, in the data center, and at the edge.
Customize state-of-the-art pretrained models that have been trained on industry-specific jargon for over 100,000 hours on NVIDIA DGX™ on industry-specific jargon.
Run end-to-end deep learning-based conversational AI applications in under 300 milliseconds (ms), the latency threshold for real-time performance.
Use one command to deploy conversational AI services in the cloud or at the data center.
State-of-the-Art Interactive Conversational AI
As conversational AI applications expand globally, they need to understand industry-specific jargon to translate and interact with humans more naturally—all in real time. Riva includes world-class automatic speech recognition (ASR) that can be customized across domains, translation to multiple languages, and controllable text-to-speech (TTS) that make the applications more expressive.
World Class Speech Recognition
Real-Time Machine Translation
Customize for Your Domain with Transfer Learning Toolkit
Transfer Learning Toolkit (TLT) offers a zero coding approach to fine-tune pre trained deep learning models, accelerating model development time up to 10X versus training from scratch. Developers and machine learning practitioners use TLT to maximize accuracy for their domain-specific applications by training on their custom data before deploying to Riva for inference in production.
Pre-trained models and TLT are available in the NVIDIA NGC™ catalog.
Develop New Multimodal Skills
Build multimodal skills such as multi-speaker transcription, chatbots, gesture recognition, and look-to-talk for your conversational AI applications.
With Riva, you can build multimodal pilot apps by fusing speech, language understanding, and vision pipelines along with a dialog manager that supports multi-user and multi-context.
Optimize Task-Specific Skills
Access high-performance skills for tasks such as speech recognition, intent recognition, speech synthesis, pose estimation, gaze detection, and facial landmark detection through a simple API.
Pipelines for each skill can be fused to build new skills. Each pipeline is performance-tuned to deliver the highest performance possible and can be customized for your specific use case.
Build and Deploy Skills Easily
Automate the steps that go from pre trained models to optimized skills deployed in the cloud and in the data center. Under the hood, Riva applies powerful NVIDIA TensorRT™ optimizations to models, configures the NVIDIA Triton™ Inference Server, and exposes the models as a service through a standard API.
To deploy, you can use a single command to download, set up, and run the entire Riva application or individual services through Helm charts on Kubernetes clusters. The Helm charts can be customized for your use case and are available in NGC.
Leading Adopters Across All Verticals
Ribbon Communications analyzes contact center calls using Riva to make smarter business decisions.
Get Started with NVIDIA
Understand the key features in Riva that help you build multimodal conversational AI services.
Fine-Tune Models with Transfer Learning Toolkit
Learn to fine-tune and achieve state-of-the-art models on your data to understand domain-specific jargon.