Overview | What's New

cuDNN 5

  • LSTM recurrent neural networks deliver up to 6x speedup in Torch.
  • Perform training up to 44% faster on a single Pascal GPU.
  • Accelerate networks with 3x3 convolutions, such as VGG, GoogleNet, and ResNets.
  • Improve performance and reduce memory usage with FP16 routines on Pascal GPUs.

cuDNN 4

The new cuDNN 4 release delivers faster neural network training and is optimized for NVIDIA Maxwell GPUs.

  • Train neural networks up to 14x faster using Google’s Batch Normalization technique.
  • Increase training and inference performance for convolutional layers up to 2x faster with new 2D tiled FFT algorithm.
  • Accelerate inference performance for convolutional layers on small batch sizes up to 2x on Maxwell-architecture GPUs.
  • Optimize for energy efficient inference with 10x better performance/Watt on Jetson TX1.

cuDNN 3

  • Train models up to 2x faster on Maxwell and Kepler-powered GPUs.
  • 2D convolutions optimized specifically for Maxwell.
  • FFT convolutions accelerate larger filter sizes (5x5 and better).
  • Train up to 2x larger models with 16-bit floating point (FP16) data storage.
  • Heuristics to select the optimal algorithm for target hardware.
  • Support for 3D non-convolution layers.
  • LCN, LRN, and logarithmic softmax layer types.
  • Optional deterministic backprop.


  • Forward and backward convolution routines designed for convolutional neural nets, tuned for NVIDIA GPUs.
  • Always optimized for latest NVIDIA GPU architectures.
  • Arbitrary dimension ordering, striding, and subregions for 4d tensors means easy integration into any neural net implementation.
  • Forward and backward paths for many other common layer types (ReLU, Sigmoid, Tanh, pooling, softmax).
  • Context-based API allows for easy multithreading.