The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe,Caffe2, Chainer, Keras,MATLAB, MxNet, TensorFlow, and PyTorch. These deep learning frameworks, with cuDNN integrated, are used to develop applications such as conversational AI, computer vision and recommenders. Access NVIDIA optimized deep learning framework containers from NGC.

cuDNN Accelerated Frameworks


Key Features

  • Grouped convolutions support NHWC inputs/outputs and FP16/FP32 compute for models such as ResNet and Xception
  • Dilated convolutions using mixed precision Tensor Core operations for applications such as semantic segmentation, image super-resolution, denoising, etc
  • TensorCore acceleration with FP32 inputs and outputs (previously restricted to FP16 input)
  • RNN cells support multiple use cases with options for cell clipping and padding masks
  • Automatically select the best RNN implementation with RNN search API
  • Arbitrary dimension ordering, striding, and sub-regions for 4d tensors means easy integration into any neural net implementation

cuDNN is supported on Windows, Linux and MacOS systems with Volta, Pascal, Kepler, Maxwell Tegra K1, Tegra X1 and Tegra X2 and Jetson Xavier GPUs.

What’s New in cuDNN 7.6

Deep learning frameworks using cuDNN 7.6, can leverage new features and performance of the Volta and Turing architectures to deliver faster training performance.

  • Speed up networks with 3D convolutions, such as VNet, on Tensor Cores
  • Tensor Core support for grouped convolutions for accelerating networks such as ResNext
  • New API to speed up fused convolutions and batchnorm operations in networks such as ResNet
  • Faster inference with new kernels and API that speeds up operations using filter layout transformations

Learn more in the cuDNN 7.6 release notes.


Learn More