Deep learning algorithms use large amounts of data and the computational power of the GPU to learn information directly from data such as images, signals, and text. Deep learning frameworks offer flexibility with designing and training custom deep neural networks and provide interfaces to common programming language. For developers the NVIDIA Deep Learning SDK offers powerful tools and libraries for the development of deep learning frameworks such as Caffe2, Cognitive toolkit, MXNet, PyTorch, TensorFlow and others.

Deep Learning Frameworks

Deep learning frameworks offer building blocks for designing, training and validating deep neural networks, through a high level programming interface. Widely used deep learning frameworks such as Caffe2, Cognitive toolkit, MXNet, PyTorch, TensorFlow and others rely on GPU-accelerated libraries such as cuDNN and NCCL to deliver high-performance multi-GPU accelerated training.

To learn more about these popular deep learning frameworks and to get started, visit the Deep Learning Frameworks page

NVIDIA Deep Learning SDK

The NVIDIA Deep Learning SDK provides powerful tools and libraries for designing and deploying GPU-accelerated deep learning applications. It includes libraries for deep learning primitives, inference, video analytics, linear algebra, sparse matrices, and multi-GPU communications.


  • Mixed precision in AI frameworks (Automatic Mixed Precision): Get upto 3X speedup running on Tensor Cores With just a few lines of code added to your existing training script
  • Deep Learning Primitives (cuDNN): High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations
  • Input Data Processing (DALI): An open source data loading and augmentation library that is fast, portable and flexible
  • Multi-GPU Communication (NCCL): Collective communication routines, such as all-gather, reduce, and broadcast that accelerate multi-GPU deep learning training
  • Deep Learning Inference Engine (TensorRT): High-performance deep learning inference runtime for production deployment
  • Deep Learning for Video Analytics (DeepStream SDK): High-level C++ API and runtime for GPU-accelerated transcoding and deep learning inference
  • Optical Flow for Video Inference (Optical Flow SDK): Set of high-level APIs that expose the latest hardware capability of Turing GPUs dedicated for computing the optical flow of pixels between images. Also useful for calculating stereo disparity and depth estimation.
  • High level SDK for tuning domain specific DNNs (Transfer Learning Toolkit): Enabling end to end Deep Learning workflows for industries
  • AI enabled Annotation for Medical Imaging (AI-Assisted Annotation SDK): AI-assisted annotation for medical imaging related data labeling
  • Deep Learning GPU Training System (DIGITS): Rapidly train highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks
  • Linear Algebra (cuBLAS): GPU-accelerated BLAS functionality that delivers 6x to 17x faster performance than CPU-only BLAS libraries
  • Sparse Matrix Operations (cuSPARSE): GPU-accelerated linear algebra subroutines for sparse matrices that deliver up to 8x faster performance than CPU BLAS (MKL), ideal for applications such as natural language processing
The Deep Learning SDK requires CUDA Toolkit, which offers a comprehensive development environment for building new GPU-accelerated deep learning algorithms, and dramatically increasing the performance of existing applications

Scaling Up Deep Learning

Kubernetes on NVIDIA GPUs and GPU Container Runtime enables enterprises to scale up training and inference deployment to multi-cloud GPU clusters seamlessly. Developers can wrap their GPU-accelerated applications along with its dependencies into a single package and deploy with Kubernetes and deliver the best performance on NVIDIA GPUs, regardless of the deployment environment.

Learn more about containers and orchestrators