The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe2, Chainer, Keras, MATLAB, MxNet, PaddlePaddle, PyTorch, and TensorFlow. For access to NVIDIA optimized deep learning framework containers that have cuDNN integrated into frameworks, visit NVIDIA GPU CLOUD to learn more and get started.

Download cuDNN      GTC2020      Developer Guide      Forums     

Chart showing A100 over 5X faster than V100 with cuDNN 8.1
Comparing the throughput on a single DGX-1V server, cuDNN 7.6.5 vs. DGX-A100, cuDNN 8.1.1, on 21.02 NGC container. End-to-end performance runs to convergence.

What’s New in cuDNN 8.3

cuDNN 8.3 is optimized for A100 GPUs delivering up to 5x higher performance versus V100 GPUs out of the box and includes new optimizations and APIs for applications such as conversational AI and computer vision. It has been redesigned for ease of use, application integration, and offers greater flexibility to developers.

cuDNN 8.3 highlights include:

  • Optimizations accelerating transformer-based models
  • Runtime fusion to compile kernels on the fly with new operators, heuristics and fusions
  • Reduced download package size by 30%

cuDNN 8.3 is now available as six smaller libraries, providing granularity when integrating into applications. Developers can download cuDNN or pull it from framework containers on NGC.

Read the latest cuDNN release notes for a detailed list of new features and enhancements.


cuDNN Key Features

  • Tensor Core acceleration for all popular convolutions including 2D, 3D, Grouped, Depth-wise separable, and Dilated with NHWC and NCHW inputs and outputs
  • Optimized kernels for computer vision and speech models including ResNet, ResNext, EfficientNet, EfficientDet, SSD, MaskRCNN, Unet, VNet, BERT, GPT-2, Tacotron2 and WaveGlow
  • Supports FP32, FP16, BF16 and TF32 floating point formats and INT8, and UINT8 integer formats
  • Arbitrary dimension ordering, striding, and sub-regions for 4d tensors means easy integration into any neural net implementation
  • Speed up fused operations on any CNN architecture

cuDNN is supported on Windows and Linux with Ampere, Turing, Volta, Pascal, Maxwell, and Kepler GPU architectures in data center and mobile GPUs.

cuDNN Accelerated Frameworks


cuDNN Resources