TensorRT 7: Accelerate End-to-end Conversational AI with New Compiler

Following recent breakthroughs in natural language processing, today NVIDIA announces new inference speedups for automatic speech recognition, natural language processing and text-to-speech with TensorRT 7. With this breakthrough, NVIDIA has now accelerated training and deployment of the entire conversational AI pipeline.

Conversational AI is comprised of three key components, with each requiring several models to be chained together:

An automatic speech recognition (ASR) component
A natural language processing component (NLP) for question answering (QA) tasks
A text-to-speech (TTS) or speech synthesis component.

Additional models for translation and recommendations are generally required to generate a meaningful response for the user. For conversational AI to be interactive, it needs to work in real time, with latencies of under 300 ms.

Machines are now able to understand human language and engage in conversation in real-time.

The latest CUDA-X AI updates enable easy, real-time deployment of conversational AI applications. These updates include:

TensorRT

TensorRT 7 delivers over 10x faster performance versus CPU on conversational AI with support for speech recognition, natural language understanding as well as text-to-speech for smarter, more natural human-to-AI conversation.

Highlights in this release include:

Support for recurrent neural networks with a new compiler
Over 20 new ONNX ops supported to import speech models
Dynamic Shapes expanded to easily optimize speech models

Examples available from NGC and the TensorRT GitHub repository

Download Now>

Neural Modules (NeMo) Toolkit

NeMo is a toolkit, based on PyTorch, created for building conversational AI models. Through modular deep neural networks development, NeMo enables fast experimentation by connecting modules, mixing and matching components.

Early collaborators are excited by the ease-of-use and flexibility that NeMo provides when building complex language models.

“The SpeechBrain project aims to create a single, flexible toolkit that makes it easy to develop state-of-the-art speech technologies. By partnering with NVIDIA, we look forward to incorporating seamless mixing and manipulation of deep learning components through the Neural Modules(NeMo) toolkit, which is pivotal in providing the flexibility and modularity needed to accelerate the development of speech applications.” – Yoshua Bengio, Founder and Scientific Director of Mila, ACM Turing Award Laureate, Professor University of Montreal

Get Started: New technical post demonstrates how to build domain specific ASR models using NeMo

More examples are also available from NGC and the NeMo GitHub repository

NVIDIA Riva Multimodal AI SDK

NVIDIA Riva is a software development kit for developing multi-modal AI apps that integrate computer vision with speech and other sensors to create a natural conversation between humans and computers. Apply for exclusive news, updates, and early access to NVIDIA Riva.