GPU Accelerated Deep Learning

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.

Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe2, MATLAB, Microsoft Cognitive Toolkit, TensorFlow, Theano, and PyTorch. See supported frameworks for more details. cuDNN is freely available to members of the NVIDIA Developer Program

What’s New in cuDNN 7?

Deep learning frameworks using cuDNN 7 can leverage new features and performance of the Volta architecture to deliver up to 3x faster training performance compared to Pascal GPUs. cuDNN 7 is now available as a free download to the members of the NVIDIA Developer Program. Highlights include:

  • Up to 2.5x faster training of ResNet50 and 3x faster training of NMT language translation LSTM RNNs on Tesla V100 vs. Tesla P100
  • Accelerated convolutions using mixed-precision Tensor Cores operations on Volta GPUs
  • Grouped Convolutions for models such as ResNeXt and Xception and CTC (Connectionist Temporal Classification) loss layer for temporal classification


cuDNN Accelerated Frameworks


Key Features

  • Forward and backward paths for many common layer types such as pooling, LRN, LCN, batch normalization, dropout, CTC, ReLU, Sigmoid, softmax and Tanh
  • Forward and backward convolution routines, including cross-correlation, designed for convolutional neural nets
  • LSTM and GRU Recurrent Neural Networks (RNN) and Persistent RNNs
  • Arbitrary dimension ordering, striding, and sub-regions for 4d tensors means easy integration into any neural net implementation
  • Tensor transformation functions
  • Context-based API allows for easy multithreading

cuDNN is supported on Windows, Linux and MacOS systems with Volta, Pascal, Kepler, Maxwell Tegra K1, Tegra X1 and Tegra X2 GPUs.

Learn More