NVIDIA Releases Updates to CUDA-X AI Libraries

Learn what’s new in the latest releases of NVIDIA CUDA-X AI libraries and the NGC catalog. For more information, see the package release notes.

NVIDIA Triton Inference Server 2.3

NVIDIA Triton Inference Server (formerly NVIDIA TensorRT Inference Server) simplifies the deployment of AI models at scale in production. It is an open source inference serving software that lets teams deploy trained AI models from any framework (TensorFlow, TensorRT, PyTorch, ONNX Runtime, or a custom framework), from local storage or Google Cloud Platform or AWS S3 on any GPU- or CPU-based infrastructure (cloud, data center, or edge). This version includes:

KFServing community standard GRPC and HTTP/REST protocols for standards based serverless inferencing
Latest backends: TensorRT 7.1, TensorFlow 2.2, PyTorch 1.6 & ONNX Runtime 1.4
Support for A100, MIG
Asynchronous custom backeds for Riva End-to-end Speech Pipeline Acceleration
Triton Model Analyzer to characterize model performance and memory footprint

TensorRT 7.2

NVIDIA TensorRT is an SDK for high-performance deep learning inference. This version of TensorRT will be available in Q4, 2020 and includes:

Optimizations for high-quality video effects such as live virtual background, delivering 30X performance vs CPUs
New Optimizations in RNNs, speeds up applications such as Fraud & Anomaly detection by 2x
Fully Connected Layer Optimizations deliver up to 2.5x faster inference for Recommenders and MLPs

NVIDIA NeMo 1.0 Beta

NVIDIA NeMo is a toolkit to develop state-of-the-art conversational AI models in three lines of code. Highlights of this version include:

Redesigned and interoperable with PyTorch and PyTorch Lightning
Easy customization of models with Hydra Framework integration.
Optimized for A100 and earlier architectures of Tensor Cores
Improved speaker recognition model

nvJPEG2000 0.0.1 Preview

NVIDIA nvJPEG2000 is a library for high performance decoding of JPEG 2000 format images. Applications that rely on nvJPEG2000 for decoding deliver higher throughput and lower latency compared to CPU-only decoding. Highlights of this version include:

Output formats: grayscale and color images with arbitrary width and height
Compression Technique: Lossy (wavelet CDF 9/7) and lossless (wavelet CDF 5/3) image compression and decompression
jp2 file format and jpeg2000 code stream are supported

DALI 0.27

The NVIDIA Data Loading Library (DALI) is a portable, open-source GPU-accelerated library for decoding and augmenting images and videos to accelerate deep learning applications. This version of DALI includes:

Support for A100 GPUs achieving over 2x speedup using JPEG hardware decoder
New audio processing operators to accelerate ASR pipelines
New Jupyter notebooks demonstrating how to load and decode audio data, and perform audio feature extraction (LINK)

Transfer Learning Toolkit 2.0

NVIDIA Transfer Learning Toolkit (TLT) is a simple, easy-to-use zero-coding AI training toolkit to create AI models using the user’s own data. The key features of TLT include

A collection of highly accurate purpose-built pre-trained models for common use-cases such as traffic analytics, parking management, people analytics and more
State-of-the-art object detection, instance segmentation and classification network architectures
Quantization-aware training for accurate INT8 precision used in inferencing
Automatic mixed precision (AMP) training to speed up training time using Tensor Cores on NVIDIA GPUs

The next version of Transfer Learning Toolkit coming in early 2021 will support speech and Natural Language Understanding (NLU) models. Sign up to be notified when it’s available.

Merlin Open Beta

Merlin is an application framework and ecosystem that enables end-to-end development of recommender systems, accelerated on NVIDIA GPUs. Merlin Open Beta highlights include:

Added multi-GPU support and NVTabular data loaders to improve interoperability with TensorFlow and PyTorch
HugeCTR is released with a parquet data reader to digest the NVTabular preprocessed data
increased optimizations to enable the training of DLRM
Added a set of new operators to the NVTabular library