The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe2, Chainer, Keras, MATLAB, MxNet, PaddlePaddle, PyTorch, and TensorFlow. For access to NVIDIA optimized deep learning framework containers that have cuDNN integrated into frameworks, visit NVIDIA GPU CLOUD to learn more and get started.
cuDNN Key Features
- Tensor Core acceleration for all popular convolutions including 2D, 3D, Grouped, Depth-wise separable, and Dilated with NHWC and NCHW inputs and outputs
- Optimized kernels for computer vision and speech models including ResNet, ResNext, EfficientNet, EfficientDet, SSD, MaskRCNN, Unet, VNet, BERT, GPT-2, Tacotron2 and WaveGlow
- Support for FP32, FP16, BF16 and TF32 floating point formats and INT8, and UINT8 integer formats
- Support for fusion of memory-limited operations like pointwise and reduction with math-limited operations like convolution and matmul
- Support for Windows and Linux with the latest NVIDIA data center and mobile GPUs.
cuDNN Accelerated Frameworks
- cuDNN Developer Guide
- Deep Dive into cuDNN 8 Webinar
- Blogs on Programming Tensor Cores in cuDNN
- Related libraries and software:
- NCCL: For fast inter-GPU communication
- cuBLAS: For GPU accelerated BLAS routines
- DALI: For fast AI data-preprocessing
- NVIDIA GPU Cloud: For containers
- Ask questions, provide feedback, and find other cuDNN users on NVIDIA Developer Forums
- To file bugs or report an issue, register on NVIDIA Developer Zone