At GTC 2017, NVIDIA announced Volta optimized updates to the NVIDIA Deep Learning SDK. Today, we’re making these updates available as free downloads to members of the NVIDIA Developer Program.
Deep learning frameworks using NVIDIA cuDNN 7 and NCCL 2 can take advantage of new features and performance benefits of the Volta architecture.
- Up to 2.5x faster training of ResNet50 and 3x faster training of NMT language translation LSTM RNNs on Tesla V100 vs. Tesla P100
- Accelerated convolutions using mixed-precision Tensor Cores operations on Volta GPUs
- Grouped Convolutions for models such as ResNeXt and Xception and CTC (Connectionist Temporal Classification) loss layer for temporal classification tasks
- Delivers over 90% multi-node scaling efficiency using up to 8 GPU-accelerated servers
- Performs automatic topology detection to determine optimal communication path
- Optimized to achieve high bandwidth over PCIe and NVLink high-speed interconnect
Learn more about Volta’s Tensor Cores and multi-node scaling of deep learning training
- Inside Volta: The World’s Most Advanced Data Center GPU
- Optimized inter-GPU collective operations with NCCL 2