Learn what’s new in the latest releases of NVIDIA’s CUDA-X AI libraries and NGC. For more information on NVIDIA’s developer tools, join live webinars, training, and Connect with the Experts sessions now through GTC Digital.
NVIDIA Collective Communications Library 2.6
NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. Highlights in this version include:
- Up to 2x peak bandwidth with in-network AllReduce operations utilizing SHARPV2
- Infiniband adaptive routing reroutes traffic to alleviate congested ports
- Topology support for AMD, ARM, PCI Gen4 and IB HDR
- Enhancements to topology detection and automatic speed detection of PCI and NICs
NVIDIA Triton Inference Server 20.03
NVIDIA Triton Inference Server, formerly TensorRT Inference Server, is an open source inference serving software to serve deep learning models in production with maximum GPU utilization. This version includes:
- Enabled prioritization per request and timeouts/drops of requests via queuing policies in the dynamic batching scheduler.
- Experimental Python client and server support for community standard GRPC inferencing API.
- Support large ONNX models storing weights across separate files
- Support ONNX runtime optimizations levels via model configuration settings
- Support running Triton on older unsupported GPUs via –min-supported-compute-capability flag
Deep Learning Profiler 0.10
Deep Learning Profiler(DLProf) is a profiling app to visualize GPU utilization, operations supported by Tensor Core and their usage during execution. This is an experimental version, it includes:
- Integration with TensorBoard to visualize results
- Expert System Recommendations
- Support for profiling with user defined NVTX markers
Try Deep Learning Profiler in NGC Tensorflow container |
Deep Learning Frameworks and Models in NGC 20.03
NVIDIA provides ready-to-run containers with GPU-accelerated frameworks, that include CUDA and CUDA-X libraries required. In addition, NGC also contains optimized models, performance benchmarks and training scripts to achieve them. Highlights in this release include:
- Experimental support for Singularity v3.0
- Optimized TransformerXL, ResNeXt101 and SE-ResNext models using Pytorch in NGC
- New layout optimization option for Automatic Mixed Precision in MXNet increases performance for training and inference for CNNs by upto 10%
- MXNet container upgraded to MXNet 1.6.0
- Deprecating TF_ENABLE_AUTO_MIXED_PRECISION env. variable for TensorFlow 2.0, use tf.train.experimental.enable_mixed_precision_graph_rewrite()
For details on features, bug releases and version compatibility, refer to release notes in documentation for containers.
DALI 0.20
NVIDIA Data Loading Library (DALI) is a portable, open-source library to GPU-accelerate decoding and augmentation of image/video in deep learning apps. This version includes:
- Optimizations for common speech processing and augmentation operators including spectrogram, mel filterbank and MFCC that can accelerate ASR models such as Jasper and RNN-T
- New tutorials (Jupyter notebook examples):
Refer to each package’s release notes in documentation for additional information.
(Originally published on March 31, 2020)