Jarvis is a fully accelerated application framework for building multimodal conversational AI services that use an end-to-end deep learning pipeline. Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7x higher throughput on GPUs compared with CPUs.
The Jarvis framework includes pre-trained conversational AI models, tools in the NVIDIA AI Toolkit, and optimized end-to-end services for speech, vision, and natural language understanding (NLU) tasks.
Fusing vision, audio, and other sensor inputs simultaneously provides capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.
Jarvis-based applications have been optimized to maximize performance on the NVIDIA EGX™ platform in the cloud, in the data center, and at the edge.
Run deep learning-based conversational AI applications in under 300 ms, the latency threshold for real-time performance.
Fuse speech and vision to offer accurate and natural interactions in virtual assistants, chatbots, and other conversational AI applications.
Use one command to deploy conversational AI services in the cloud or at the edge.
SOTA Interactive Conversational AI
As conversational AI applications are expanding globally, they need to understand industry specific jargon, translate and interact with humans more naturally - all in real time. Jarvis includes world class ASR that can be customized across domains, translation to multiple languages and controllable TTS making the applications more expressive.
World Class Speech Recognition
Real-Time Machine Translation
"Ping An addresses millions of queries from customers each day using chat-bot agents. As an early partner of the Jarvis early access program, we were able to use the tools and build better solutions with higher accuracy and lower latency, thus providing better services. More specifically, with NeMo, the pre-trained model, and the ASR pipeline optimized using Jarvis, the system achieved 5% improvement on accuracy, so as to serve our customers with better experience."
— Dr. Jing Xiao, the Chief Scientist at Ping An
"In our evaluation of Jarvis for virtual assistants and speech analytics, we saw remarkable accuracy by fine-tuning the Automated Speech Recognition models in the Russian language using the NeMo toolkit in Jarvis. Jarvis can provide up to 10x throughput performance with powerful TensorRT optimizations on models, so we’re looking forward to using Jarvis to get the most out of these technology advancements.”
— Nikita Semenov, Head of ML at MTS AI
“InstaDeep delivers decision-making AI products and solutions for enterprises. For this project, our goal is to build a virtual assistant in the Arabic language, and NVIDIA Jarvis played a significant role in improving the application’s performance. Using the NeMo toolkit in Jarvis, we were able to fine-tune an Arabic speech-to-text model to get a Word Error Rate as low as 7.84% and reduced the training time of the model from days to hours using GPUs. We look forward to integrating these models in Jarvis’ end-to-end pipeline to ensure real-time latency.”
— Karim Beguir, CEO and Co-Founder at InstaDeep
"At Intelligent Voice, we provide high performance speech recognition solutions, but our customers are always looking for more. Jarvis takes a multi-modal approach that fuses key elements of Automatic Speech Recognition with entity and intent matching to address new use cases where high-throughput and low latency are required. The Jarvis API is very easy to use, integrate and customize to our customers’ workflows for optimized performance.”
— Nigel Cannings, CTO at Intelligent Voice
“At Northwestern Medicine, we aim to improve patient satisfaction and staff productivity with our suite of healthcare AI solutions. Conversational AI, powered by NVIDIA Clara Guardian and Jarvis, improves patient and staff safety during COVID-19 by reducing direct physical contact while delivering high-quality care. Jarvis ASR and TTS models make this conversational AI a reality. Patients now no longer need to wait for the clinical staff to become available, they can receive immediate answers from an AI-powered virtual assistant.”
— Andrew Gostine, MD, MBA, CEO of Artisight
“Low latency is critical in call centers, and with NVIDIA GPUs, our agents are able to listen, understand, and respond in under a second with the highest levels of accuracy. Based on early evaluations of speech and language understanding pipelines in NVIDIA Jarvis, we believe we can improve latency even further while maintaining accuracy, delivering the best experience possible for our customers.”
— Alan Bekker, co-founder and CTO of Voca
“Through the NVIDIA Jarvis early access program, we’ve been able to power our conversational AI products with state-of-the-art models using NVIDIA NeMo, significantly reducing the cost of getting started. Jarvis speech recognition has amazingly low latency and high accuracy. Having the flexibility to deploy on-prem and offer a range of data privacy and security options to our customers has helped us position our conversational AI-enabled products in new industry verticals.”
— Rajesh Jha, CEO of Siminsights.
“Conversational AI applications are data hungry. Imagine the data needed to train models or the storage required to hold all of the information to have more natural and useful interactions. Jarvis helped us to leverage this data to reach our goal of building virtual assistants for retail-stores faster. Jarvis pipelines use state-of-the-art deep learning models and run the conversational applications in milliseconds.”
— AJ Mahajan, Senior Director, Solutions at NetApp