ICYMI: New AI Tools and Technologies Announced at GTC 2021 Keynote

At GTC 2021, NVIDIA announced new software tools to help developers build optimized conversational AI, recommender, and video solutions. Watch the keynote from CEO, Jensen Huang, for insights on all of the latest GPU technologies.

Announcing Availability of NVIDIA Riva

Today NVIDIA announced major conversational AI capabilities in NVIDIA Riva that will help enterprises build engaging and accurate applications for their customers. These include highly accurate automatic speech recognition, real-time translation for multiple languages and text-to-speech capabilities to create expressive conversational AI agents.

Highlights include:

Out-of-the-box speech recognition model trained on multiple large corpus with greater than 90% accuracy
Transfer Learning Toolkit in TAO to fine-tune models on any domain
Real-time translation for five languages that run under 100ms latency per sentence
Expressive text-to-speech that delivers 30x higher throughput compared with Tacotron2

The new capabilities are planned for release in Q2 2021as part of the NVIDIA Riva open beta program.

Resources:

> NVIDIA Riva Developer Blog post s: Includes introduction to Riva and tutorials for building conversational AI apps.

Add this GTC session to your calendar to learn more:

> Building and Deploying a Custom Conversational AI App with NVIDIA Transfer Learning Toolkit and Riva

Announcing NVIDIA TAO Framework – Early Access

Today NVIDIA announced NVIDIA Train, Adapt, and Optimize (TAO), a GUI-based, workflow-driven framework that simplifies and accelerates the creation of enterprise AI applications and services. By fine-tuning pretrained models, enterprises can produce domain specific models in hours rather than months, eliminating the need for large training runs and deep AI expertise.

NVIDIA TAO simplifies the time-consuming parts of a deep learning workflow, from data preparation to training to optimization, shortening the time to value.

Highlights include:

Access a diverse set of pretrained models including speech, vision, natural language understanding and more
Speedup your AI development by over 10X with NVIDIA pre-trained models and TLT
Increase model performance with federated learning while preserving data privacy
Optimize models for high-throughput, low-latency inference with NVIDIA TensorRT
Optimal configuration deployment for any model architecture on a CPU or GPU with NVIDIA Triton Inference Server
Seamlessly deploy and orchestrate AI applications with NVIDIA Fleet Command

Apply for early access to NVIDIA TAO here.

Announcing NVIDIA Maxine – Available for Download Now

Today NVIDIA announced availability for NVIDIA Maxine SDKs, which are used by developers to build innovative virtual collaboration and content creation applications such as video conferencing and live streaming. Maxine’s state-of-the-art AI technologies are highly optimized and deliver the highest performance possible on GPUs, both on PCs and in data centers.

Highlights from this release include:

Video Effects SDK: super resolution, video noise removal, virtual background
Augmented Reality SDK: 3D effects such as face tracking and body pose estimation
Audio Effects SDK: high quality noise removal and room echo removal

In addition, we announced AI Face Codec, a novel AI-based method from NVIDIA research to compress videos and render human faces for video conferencing. It can deliver up to 10x reduction in bandwidth vs H.264.

Developers building Maxine-based apps can use Jarvis for real time transcription, translation and virtual assistant capabilities.

Get started with Maxine

Resources:

> Reinvent Video Conferencing, Content Creation & Streaming with AI Using NVIDIA Maxine

Add these GTC sessions to your calendar to learn more:

> NVIDIA Maxine: An Accelerated Platform SDK for Developers of Video Conferencing Services

> How to Process Live Video Streams on Cloud GPUs Using NVIDIA Maxine SDK

> Real-time AI for Video-Conferencing with Maxine

Announcing NVIDIA Triton Inference Server 2.9

Today, NVIDIA announced the latest version of the Triton Inference Server. Triton is an open source inference serving software that maximizes performance and simplifies production deployment at scale.

Highlights from this release include:

Model Navigator, a new tool in Triton (alpha), automatically converts TensorFlow and PyTorch models to TensorRT plan, validates accuracy, and sets up a deployment environment.
Model Analyzer now automatically determines optimal batch size and number of concurrent model instances to maximize performance, based on latency or throughput targets.
Support for OpenVINO backend (beta) for high performance inferencing on CPU, Windows Triton build (alpha), and integration with MLOps platforms: Seldon and Allegro

Download Triton from NGC. Access code and documentation at /triton-inference-server GitHub repo.

Add this GTC session to your calendar to learn more:

> Easily Deploy AI Deep Learning Models at Scale with Triton Inference Server

Announcing TensorRT 8.0

Today, NVIDIA announced TensorRT 8.0, the latest version of its high-performance deep learning inference SDK. TensorRT includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference optimizations. With the new features and optimizations, inference applications can now run up to 2x faster with INT8 precision, with accuracy similar to FP32.

Highlights from this release include:

Quantization Aware Training to experience FP32 accuracy with INT8 precision
Support for Sparsity on Ampere GPUs delivers up to 50% higher throughput on Ampere GPUs
Upto 2x faster inference for transformer based networks like BERT with new compiler optimizations

TensorRT 8 will be available in Q2, 2021 from the TensorRT page. The latest version of samples, parsers, and notebooks are always available in the TensorRT open source repo.

Add these GTC sessions to your calendar to learn more:

> Accelerate Deep Learning Inference with TensorRT 8.0

> Quantization Aware Training in PyTorch with TensorRT 8.0

Announcing NVIDIA Merlin End-to-End Accelerated Recommender System

Today NVIDIA announced the latest release of NVIDIA Merlin, an open beta application framework that enables the end-to-end development of deep learning recommender systems, from data preprocessing to model training and inference, all accelerated on NVIDIA GPUs. With this release, Merlin delivers a new API and inference support that streamlines the recommender workflow.

Highlights from this release include:

New Merlin API makes it easier to define workflows and training pipelines
Deepened support for inference and integration with Triton Inference Server
Scales transparently to larger datasets and more complex models

Resources:

Add these GTC sessions to your calendar to learn more:

> End-2-end Deployment of GPU Accelerated Recommender Systems: From ETL to Training to Inference (Training Session)

> Accelerated ETL, Training and Inference of Recommender Systems on the GPU with Merlin, HugeCTR, NVTabular, and Triton

Announcing Data Labeling & Annotation Partner Services For Transfer Learning Toolkit

Today, NVIDIA announced that it is working with six leading NVIDIA partners to provide solutions for data labeling, making it easy to adapt pretrained models to specific domain data and train quickly and efficiently. These companies are AI Reverie, Appen, Hasty,ai, Labelbox, Sama, and Sky Engine.

Training reliable AI and machine learning models requires vast amounts of accurately labeled data and acquiring labeled and annotated data at scale is a challenge for several enterprises. Using these integrations, developers can use the partner services and platforms with NVIDIA Transfer Learning Toolkit (TLT) to either perform annotation, utilize partners’ synthetic data with TLT, or use external annotation tools and then import data to TLT for training and model optimization.

To learn more about the integration, read the Developer Blog post:

> Integrating with Data Generation and Labelling Tools for Accurate AI Training

Download the Transfer Learning Toolkit and get started.

Add these GTC sessions to your calendar to learn more:

> Train Smarter not Harder with NVIDIA Pre-trained models and Transfer Learning Toolkit 3.0

> Connect with the Experts: Transfer Learning Toolkit and DeepStream SDK for Vision AI/Intelligent Video Analytics

Announcing DeepStream 6.0

NVIDIA DeepStream SDK is the AI streaming analytics toolkit for building high performance, low-latency, complex video analytics apps and services. Today, NVIDIA announced DeepStream 6.0. This latest version brings a new Graphical User Interface to help developers build reliable AI applications faster, and fast track the entire workflow from prototyping to deployment across the edge and cloud. With the new GUI and a suite of productivity tools, you can build AI apps in days versus weeks.

Add these GTC sessions to your calendar to learn more:

> Bringing Scale and Optimization to Video Analytics Pipelines with NVIDIA Deepstream SDK

> Connect with the Experts: Transfer Learning Toolkit and DeepStream SDK for Vision AI/Intelligent Video Analytics

> Full list of intelligent video analytics talk at GTC

Register for GTC this week for more on the latest GPU-accelerated AI technologies.