GPU Accelerated Deep Learning

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.

Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe2, MATLAB, Microsoft Cognitive Toolkit, TensorFlow, Theano, and PyTorch. For access to NVIDIA optimized deep learning framework containers, visit NVIDIA GPU CLOUD to learn more and get started.


What’s New in cuDNN 7.1?

Deep learning frameworks using cuDNN 7 and later, can leverage new features and performance of the Volta architecture to deliver up to 3x faster training performance compared to Pascal GPUs. cuDNN 7.1 highlights include:

  • Automatically select the best RNN implementation with RNN search API
  • Perform grouped convolutions of any supported convolution algorithm for models such as ResNeXt and Xception
  • Train speech recognition and language models with projection layer for bi-directional RNNs
  • Better error reporting with logging support for all APIs

Read the latest cuDNN release notes for a detailed list of new features and enhancements.

cuDNN Accelerated Frameworks


Key Features

  • Forward and backward paths for many common layer types such as pooling, LRN, LCN, batch normalization, dropout, CTC, ReLU, Sigmoid, softmax and Tanh
  • Forward and backward convolution routines, including cross-correlation, designed for convolutional neural nets
  • LSTM and GRU Recurrent Neural Networks (RNN) and Persistent RNNs
  • Forward and backward pass using FP32, FP16 (Tensor Cores) data types and forward pass using UINT8 (Volta and later)
  • Arbitrary dimension ordering, striding, and sub-regions for 4d tensors means easy integration into any neural net implementation
  • Tensor transformation functions
  • Context-based API allows for easy multithreading

cuDNN is supported on Windows, Linux and MacOS systems with Volta, Pascal, Kepler, Maxwell Tegra K1, Tegra X1 and Tegra X2 GPUs.

Learn More