Today, NVIDIA released Jarvis 1.0 Beta which includes an end-to-end workflow for building and deploying real-time conversational AI apps, such as transcription, virtual assistants and chatbots. Jarvis is a flexible application framework for multimodal conversational AI services that delivers real-time performance on NVIDIA GPUs.
This release of Jarvis includes new pre-trained models for conversation AI and support for Transfer Learning Toolkit (TLT) so enterprises can easily adapt apps to their specific use case and domain. These apps are able to understand context and nuance offering a better experience to users.
With Jarvis, enterprises get state-of-the-art models, ~10x speedup in development time using transfer learning with TLT, and fully optimized and GPU-accelerated pipelines for creating intelligent language-based applications that can run in real time.
Highlights from this version include:
- ASR, NLU, and TTS models trained on thousands of hours of speech data.
- TLT with zero coding approach to quickly re-train models on custom data.
- Fully accelerated deep learning pipelines optimized to run as scalable services.
- End-to-end workflow and tools to deploy services using one line of code.
Conversational AI is opening new opportunities in every industry, from finance and healthcare to consumer services.
Early adopters of Jarvis include InstaDeep, a company creating virtual assistants in the Arabic language. NVIDIA Jarvis played a significant role in improving their application’s performance. Using the NeMo toolkit in Jarvis, they were able to fine-tune an Arabic speech-to-text model to get a Word Error Rate as low as 7.84%.
One of the largest mobile network operators in Russia, MTS, is working with Jarvis for chatbots and virtual assistants for customer support. With Jarvis, they saw remarkable accuracy by fine-tuning the ASR models in the Russian language and higher throughout performance with TensorRT optimizations.
Ribbon is leveraging Jarvis in their real-time communications and call processing platform to do advanced AI text-to-speech. Business and government organizations record tens of millions of calls every day, but it’s nearly impossible to search them to pull out important insights. Through Jarvis, recordings can now be turned into text so that AI tools can quickly search and analyze this data.
In the area of healthcare, Northwestern Medicine is working with Artisight to make hospitals smarter.
“At Northwestern Medicine, we aim to improve patient satisfaction and staff productivity with our suite of healthcare AI solutions,” said Andrew Gostine, MD, MBA, CEO of Artisight. “Conversational AI, powered by NVIDIA Clara Guardian and Jarvis, improves patient and staff safety during COVID-19 by reducing direct physical contact while delivering high-quality care. Jarvis ASR and TTS models make this conversational AI a reality. Patients now no longer need to wait for the clinical staff to become available, they can receive immediate answers from an AI-powered virtual assistant.”
Meanwhile Intelligent Voice, which has a system that uses speech recognition technology to capture calls, convert them into text and automatically send transcripts, saw great results with Jarvis.
“At Intelligent Voice, we provide high performance speech recognition solutions, but our customers are always looking for more,” said Nigel Cannings, CTO at Intelligent Voice. “Jarvis takes a multi-modal approach that fuses key elements of Automatic Speech Recognition with entity and intent matching to address new use cases where high-throughput and low latency are required. The Jarvis API is very easy to use, integrate and customize to our customers’ workflows for optimized performance.”
NVIDIA Jarvis and Transfer Learning Toolkit are available freely for download to members of the NVIDIA developer program today. On the ‘Getting Started’ page, you will find several resources such as samples, Jupyter notebooks and tutorial blogs for new users.