Learn what’s new in the latest releases of NVIDIA CUDA-X AI libraries and the NGC catalog. For more information, see the package release notes.
NVIDIA Triton Inference Server 2.3
NVIDIA Triton Inference Server (formerly NVIDIA TensorRT Inference Server) simplifies the deployment of AI models at scale in production. It is an open source inference serving software that lets teams deploy trained AI models from any framework (TensorFlow, TensorRT, PyTorch, ONNX Runtime, or a custom framework), from local storage or Google Cloud Platform or AWS S3 on any GPU- or CPU-based infrastructure (cloud, data center, or edge). This version includes:
- KFServing community standard GRPC and HTTP/REST protocols for standards based serverless inferencing
- Latest backends: TensorRT 7.1, TensorFlow 2.2, PyTorch 1.6 & ONNX Runtime 1.4
- Support for A100, MIG
- Asynchronous custom backeds for Riva End-to-end Speech Pipeline Acceleration
- Triton Model Analyzer to characterize model performance and memory footprint
TensorRT 7.2
NVIDIA TensorRT is an SDK for high-performance deep learning inference. This version of TensorRT will be available in Q4, 2020 and includes:
- Optimizations for high-quality video effects such as live virtual background, delivering 30X performance vs CPUs
- New Optimizations in RNNs, speeds up applications such as Fraud & Anomaly detection by 2x
- Fully Connected Layer Optimizations deliver up to 2.5x faster inference for Recommenders and MLPs
NVIDIA NeMo 1.0 Beta
NVIDIA NeMo is a toolkit to develop state-of-the-art conversational AI models in three lines of code. Highlights of this version include:
- Redesigned and interoperable with PyTorch and PyTorch Lightning
- Easy customization of models with Hydra Framework integration.
- Optimized for A100 and earlier architectures of Tensor Cores
- Improved speaker recognition model
nvJPEG2000 0.0.1 Preview
NVIDIA nvJPEG2000 is a library for high performance decoding of JPEG 2000 format images. Applications that rely on nvJPEG2000 for decoding deliver higher throughput and lower latency compared to CPU-only decoding. Highlights of this version include:
- Output formats: grayscale and color images with arbitrary width and height
- Compression Technique: Lossy (wavelet CDF 9/7) and lossless (wavelet CDF 5/3) image compression and decompression
- jp2 file format and jpeg2000 code stream are supported
DALI 0.27
The NVIDIA Data Loading Library (DALI) is a portable, open-source GPU-accelerated library for decoding and augmenting images and videos to accelerate deep learning applications. This version of DALI includes:
- Support for A100 GPUs achieving over 2x speedup using JPEG hardware decoder
- New audio processing operators to accelerate ASR pipelines
- New Jupyter notebooks demonstrating how to load and decode audio data, and perform audio feature extraction (LINK)
Transfer Learning Toolkit 2.0
NVIDIA Transfer Learning Toolkit (TLT) is a simple, easy-to-use zero-coding AI training toolkit to create AI models using the user’s own data. The key features of TLT include
- A collection of highly accurate purpose-built pre-trained models for common use-cases such as traffic analytics, parking management, people analytics and more
- State-of-the-art object detection, instance segmentation and classification network architectures
- Quantization-aware training for accurate INT8 precision used in inferencing
- Automatic mixed precision (AMP) training to speed up training time using Tensor Cores on NVIDIA GPUs
The next version of Transfer Learning Toolkit coming in early 2021 will support speech and Natural Language Understanding (NLU) models. Sign up to be notified when it’s available.
Merlin Open Beta
Merlin is an application framework and ecosystem that enables end-to-end development of recommender systems, accelerated on NVIDIA GPUs. Merlin Open Beta highlights include:
- Added multi-GPU support and NVTabular data loaders to improve interoperability with TensorFlow and PyTorch
- HugeCTR is released with a parquet data reader to digest the NVTabular preprocessed data
- increased optimizations to enable the training of DLRM
- Added a set of new operators to the NVTabular library