Latest Updates to NVIDIA CUDA-X AI Libraries

Learn what’s new in the latest releases of NVIDIA’s CUDA-X AI libraries and NGC. For more information on NVIDIA’s developer tools, join live webinars, training, and Connect with the Experts sessions now through GTC Digital.

NVIDIA Collective Communications Library 2.6

NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. Highlights in this version include:

Up to 2x peak bandwidth with in-network AllReduce operations utilizing SHARPV2
Infiniband adaptive routing reroutes traffic to alleviate congested ports
Topology support for AMD, ARM, PCI Gen4 and IB HDR
Enhancements to topology detection and automatic speed detection of PCI and NICs

Download Now

NVIDIA Triton Inference Server 20.03

NVIDIA Triton Inference Server, formerly TensorRT Inference Server, is an open source inference serving software to serve deep learning models in production with maximum GPU utilization. This version includes:

Enabled prioritization per request and timeouts/drops of requests via queuing policies in the dynamic batching scheduler.
Experimental Python client and server support for community standard GRPC inferencing API.
Support large ONNX models storing weights across separate files
Support ONNX runtime optimizations levels via model configuration settings
Support running Triton on older unsupported GPUs via –min-supported-compute-capability flag

Download Now

Deep Learning Profiler 0.10

Deep Learning Profiler(DLProf) is a profiling app to visualize GPU utilization, operations supported by Tensor Core and their usage during execution. This is an experimental version, it includes:

Integration with TensorBoard to visualize results
Expert System Recommendations
Support for profiling with user defined NVTX markers

Try Deep Learning Profiler in NGC Tensorflow container

Deep Learning Frameworks and Models in NGC 20.03

NVIDIA provides ready-to-run containers with GPU-accelerated frameworks, that include CUDA and CUDA-X libraries required. In addition, NGC also contains optimized models, performance benchmarks and training scripts to achieve them. Highlights in this release include:

Experimental support for Singularity v3.0
Optimized TransformerXL, ResNeXt101 and SE-ResNext models using Pytorch in NGC
New layout optimization option for Automatic Mixed Precision in MXNet increases performance for training and inference for CNNs by upto 10%
MXNet container upgraded to MXNet 1.6.0
Deprecating TF_ENABLE_AUTO_MIXED_PRECISION env. variable for TensorFlow 2.0, use tf.train.experimental.enable_mixed_precision_graph_rewrite()

For details on features, bug releases and version compatibility, refer to release notes in documentation for containers.

NGC Repository

DALI 0.20

NVIDIA Data Loading Library (DALI) is a portable, open-source library to GPU-accelerate decoding and augmentation of image/video in deep learning apps. This version includes:

Optimizations for common speech processing and augmentation operators including spectrogram, mel filterbank and MFCC that can accelerate ASR models such as Jasper and RNN-T