NGC is a single source for researchers seeking access to deep learning frameworks, HPC applications, and visualization tools essential for their scientific workflows.
This article summarizes the most important improvements to our AI Framework Containers from the last three releases, 19.01 ( published today), 18.12, and 18.11.
TensorFlow Highlights
-
- Updated to latest versions of TensorFlow 1.12.0, TensorRT 5.0.2, NCCL 2.3.7, cuDNN 7.4.2, DALI 0.6 Beta and Horovod 0.15.1
-
- TensorFlow with TensorRT
-
- Updated documentation with accuracy benchmarks for 10 common computer vision models
-
- Added support for ReLu6, Identity, and dilated convolutions and published the full list of supported operations
-
- Published examples of converting from Checkpoints, SavedModel, and Frozen Graph
-
- Published ResNet50 end-to-end example including TensorBoard visualization
- Published SSD end-to-end example inside our containers: workspace/nvidia-examples/inference/object-detection.
-
- TensorFlow with TensorRT
-
- Added OpenSeq2Seq’s custom pre-built CTC decoder for models DeepSpeech2, wav2letter, and Jasper
-
- NCCL: The tensorflow.contrib.nccl module has been moved into core as tensorflow.python.ops.nccl_ops. User scripts may need to be updated accordingly. No changes are required for Horovod users. For an example of using Horovod, refer to the nvidia-examples/cnn/ directory inside the container.
- More details: 19.01 release notes, 18.12 release notes, 18.11 release notes
PyTorch Highlights
-
- Latest versions of PyTorch v1.0.0, NCCL 2.3.7, cuDNN 7.4.2, DALI 0.6 Beta, TensorRT 5.0.2 and Horovod 0.15.1
-
- Tensor Core Examples, included in the container examples directory
-
- An implementation of ResNet50. The ResNet50 v1.5 model is a modified version of the original ResNet50 v1 model.
-
- An implementation of GNMT v2. The GNMT v2 model is similar to the one discussed in Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation White Paper
-
- An implementation of the Neural Collaborative Filtering (NCF) model. The NCF model focuses on providing recommendations, also known as collaborative filtering; with implicit feedback. NCF was first described by Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu and Tat-Seng Chua in the Neural Collaborative Filtering paper.
- An implementation of the Transformer model architecture. The Transformer model is based on the optimized implementation in Facebook’s Fairseq NLP Toolkit and is built on top of PyTorch. The original version in the Fairseq project was developed using Tensor Cores, which provides significant training speedup.
-
- Tensor Core Examples, included in the container examples directory
-
- Performance improvement for PyTorch native batch normalization.
-
- Mixed precision SoftMax enabling FP16 inputs, FP32 computations and FP32 outputs.
- More details: 19.01 release notes, 18.12 release notes, 18.11 release notes
MXNet Highlights
-
- Latest version of NCCL 2.3.7, cuDNN 7.4.2, DALI 0.6 Beta, TensorRT 5.0.2, Horovod 0.15.1, Amazon Labs Sockeye 1.18.61, ONNX exporter 0.1
-
- Tensor Core Examples
- An implementation of ResNet50. The ResNet50 v1.5 model is a modified version of the original ResNet50 v1 model, included in the container examples directory.
- Tensor Core Examples
-
- Performance improvements
-
- Added MXNET_EXEC_ENABLE_ADDTO environment variable, which when set to 1 increases performance for some networks.
-
- Increased performance of Batchnorm and Batchnorm+Relu operators in FP16 and NHWC data format.
-
- Increased performance when training with small batch sizes.
-
- Improved speed of metrics computation during training, especially in the case of using TopKAccuracy metric.
- Added fused BatchNormAddRelu operator to the MXNet Symbol package (accessible via mx.sym.BatchNormAddRelu).
-
- Performance improvements
-
- Added Horovod support for multi-GPU and multi-node
-
- Added support for multi-node via Horovod integration. Currently you can use it by specifying horovod type of KVStore.
-
- Added MXNET_UPDATE_ON_KVSTORE environment variable, which controls whether to update parameters using KVStore (default is 1 for KVStore device and 0 for KVStore horovod).
- Added aggregation of SGD updates which increases performance when update on KVStore is disabled.
-
- Added Horovod support for multi-GPU and multi-node
-
- Updated examples
-
- Improved handling of float32 datatype in examples/image-classification/train_imagenet_runner.
- Added resnet-v1b as possible network in the train_imagenet_runner script.
-
- Updated examples
-
- Profiling
- Enabled NVIDIA Tools Extension SDK (NVTX) instrumentation.
- Profiling
- More details: 19.01 release notes, 18.12 release notes, 18.11 release notes
NGC features the latest AI frameworks tuned, tested and certified by NVIDIA for use on cloud providers with the latest NVIDIA GPUs that allows you to accelerate your application. The updated and optimized AI containers are available today!
Download Now>