The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe2, Chainer, Keras, MATLAB, MxNet, PaddlePaddle, PyTorch, and TensorFlow. For access to NVIDIA optimized deep learning framework containers that have cuDNN integrated into frameworks, visit NVIDIA GPU CLOUD to learn more and get started.
What’s New in cuDNN 8.2
cuDNN 8.2 is optimized for A100 GPUs delivering up to 5x higher performance versus V100 GPUs out of the box and includes new optimizations and APIs for applications such as conversational AI and computer vision. It has been redesigned for ease of use, application integration, and offers greater flexibility to developers.
cuDNN 8.2 highlights include:
- Support for BFloat16 for CNNs on NVIDIA Ampere architecture GPUs
- Flexibly fuse operators such as convolutions, point-wise operations and reductions at runtime to speed up CNNs
- Faster out-of-the-box performance with new dynamic kernel selection infrastructure
- Up to 2X higher RNN performance with new optimizations and heuristics
Read the latest cuDNN release notes for a detailed list of new features and enhancements.
- Tensor Core acceleration for all popular convolutions including 2D, 3D, Grouped, Depth-wise separable, and Dilated with NHWC and NCHW inputs and outputs
- Optimized kernels for computer vision and speech models including ResNet, ResNext, EfficientNet, EfficientDet, SSD, MaskRCNN, Unet, VNet, BERT, GPT-2, Tacotron2 and WaveGlow
- Supports FP32, FP16, BF16 and TF32 floating point formats and INT8, and UINT8 integer formats
- Arbitrary dimension ordering, striding, and sub-regions for 4d tensors means easy integration into any neural net implementation
- Speed up fused operations on any CNN architecture
cuDNN Accelerated Frameworks
- NVIDIA Deep Learning SDK documentation
- Deep Dive into cuDNN 8 Webinar
- Blogs on Programming Tensor Cores in cuDNN
- Related libraries and software:
- Find other cuDNN developers on NVIDIA Developer Forums
- For questions or to provide feedback, please contact cuDNN@nvidia.com
- To file bugs or report an issue,register on NVIDIA Developer Zone