Deep learning algorithms use large amounts of data and the computational power of the GPU to learn information directly from data such as images, signals, and text. Deep learning frameworks offer flexibility with designing and training custom deep neural networks and provide interfaces to common programming language. For developers the NVIDIA Deep Learning SDK offers powerful tools and libraries for the development of deep learning frameworks such as Caffe2, Cognitive toolkit, MXNet, PyTorch, TensorFlow and others.

Deep Learning Frameworks

Deep learning frameworks offer building blocks for designing, training and validating deep neural networks, through a high level programming interface. Widely used deep learning frameworks such as Caffe2, Cognitive toolkit, MXNet, PyTorch, TensorFlow and others rely on GPU-accelerated libraries such as cuDNN and NCCL to deliver high-performance multi-GPU accelerated training.

To learn more about these popular deep learning frameworks and to get started, visit the Deep Learning Frameworks page

NVIDIA Deep Learning SDK

The NVIDIA Deep Learning SDK provides powerful tools and libraries for designing and deploying GPU-accelerated deep learning applications. It includes libraries for deep learning primitives, inference, video analytics, linear algebra, sparse matrices, and multi-GPU communications.


  • Deep Learning Primitives (cuDNN): High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations
  • Deep Learning Inference Engine (TensorRT): High-performance deep learning inference runtime for production deployment
  • Deep Learning for Video Analytics (DeepStream SDK): High-level C++ API and runtime for GPU-accelerated transcoding and deep learning inference
  • Linear Algebra (cuBLAS): GPU-accelerated BLAS functionality that delivers 6x to 17x faster performance than CPU-only BLAS libraries
  • Sparse Matrix Operations (cuSPARSE): GPU-accelerated linear algebra subroutines for sparse matrices that deliver up to 8x faster performance than CPU BLAS (MKL), ideal for applications such as natural language processing
  • Multi-GPU Communication (NCCL): Collective communication routines, such as all-gather, reduce, and broadcast that accelerate multi-GPU deep learning training on up to eight GPUs
The Deep Learning SDK requires CUDA Toolkit, which offers a comprehensive development environment for building new GPU-accelerated deep learning algorithms, and dramatically increasing the performance of existing applications

Scaling Up Deep Learning

Kubernetes on NVIDIA GPUs and GPU Container Runtime enables enterprises to scale up training and inference deployment to multi-cloud GPU clusters seamlessly. Developers can wrap their GPU-accelerated applications along with its dependencies into a single package and deploy with Kubernetes and deliver the best performance on NVIDIA GPUs, regardless of the deployment environment.

Learn more about containers and orchestrators