Deep Learning Frameworks

Deep learning (DL) frameworks offer building blocks for designing, training, and validating deep neural networks through a high-level programming interface. Widely-used DL frameworks, such as PyTorch, JAX, TensorFlow, PyTorch Geometric, DGL, and others, rely on GPU-accelerated libraries, such as cuDNN, NCCL, and DALI to deliver high-performance, multi-GPU-accelerated training.

View all frameworks

NVIDIA-Optimized DL Frameworks

Developers, researchers, and data scientists can get easy access to NVIDIA optimized DL framework containers with DL examples that are performance-tuned and tested for NVIDIA GPUs. This eliminates the need to manage packages and dependencies or build DL frameworks from source. Containerized DL frameworks, with all dependencies included, provide an easy place to start developing common applications, such as conversational AI, natural language understanding (NLU), recommenders, and computer vision. Visit the NVIDIA NGC™ catalog to learn more.

PyTorch

PyTorch is a Python package that provides two high-level features:

Tensor computation (like numpy) with strong GPU acceleration.
Deep Neural Networks (DNNs) built on a tape-based autograd system.

Reuse your favorite Python packages, such as numpy, scipy and Cython, to extend PyTorch when needed.

PyTorch on NGC Sample models Automatic mixed precision

Model Deployment

For high performance inference deployment for PyTorch trained models:

1. Use the Torch-TensorRT integration to optimize and deploy models within PyTorch.
2. Export the PyTorch model to ONNX format, and import, optimize and deploy with NVIDIA TensorRT™, an SDK for high performance deep learning inference.

Learning Resources

Deep Learning Documentation: Running the PyTorch Container from the NGC catalog
PyTorch Tutorials
Torch-TensorRT Documentation

TensorFlow

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. For visualizing TensorFlow results, TensorFlow offers TensorBoard, a suite of visualization tools.

TensorFlow on NGC TensorFlow on GitHub Sample models Automatic mixed precision TensorFlow for JetPack

Model Deployment

For high performance inference deployment for TensorFlow trained models:

1. Use the TensorFlow-TensorRT integration to optimize and deploy models within TensorFlow.
2. Export the TensorFlow model to ONNX and import, optimize, and deploy with NVIDIA TensorRT, an SDK for high performance deep learning inference.

Learning Resources

Deep Learning Documentation: TensorFlow User Guide
Deep Learning Documentation: TensorFlow Best Practices
TensorFlow Getting Started Guide

JAX

JAX is a Python library designed for high-performance numerical computing and machine learning research. JAX can automatically differentiate native Python and implement the NumPy API. With just a few lines of code change, JAX enables distributed training across multi-node, multi-GPU systems, with accelerated performance through XLA.

JAX on NGC JAX on GitHub

Learning Resources

PaddlePaddle

PaddlePaddle provides an intuitive and flexible interface for loading data and specifying model structures. It supports CNN, RNN, and multiple variants, and easily configures complicated deep models.

PaddlePaddle also provides extremely optimized operations, memory recycling, and network communication, and makes it easy to scale heterogeneous computing resources and storage to accelerate the training process.

PaddlePaddle on NGC PaddlePaddle install page Paddle Paddle Source

Model Deployment

For high performance inference deployment for PaddlePaddle trained models:

Use Paddle-TensorRT integration to optimize and deploy models within PaddlePaddle.
Export the PaddlePaddle model to ONNX and import, optimize, and deploy with NVIDIA TensorRT, an SDK for high performance deep learning inference.

Learning Resources

PaddlePaddle Getting Started Guide

MXNet

MXNet is a DL framework designed for both efficiency and flexibility. It allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity.

At its core is a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on-the-fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The library is portable and lightweight, and it scales to multiple GPUs and machines.

MXNet on NGC Sample models Automatic mixed precision

Model Deployment

For high performance inference deployment for trained models, export to ONNX format, and optimize and deploy with NVIDIA TensorRT, an SDK for high performance deep learning inference.

Learning Resources

Deep Learning Documentation: Running the MXNet Container

MATLAB

MATLAB makes DL easy for engineers, scientists, and domain experts. With tools and functions for managing and labeling large data sets, MATLAB also offers specialized toolboxes for working with machine learning, neural networks, computer vision, and automated driving. With just a few lines of code, MATLAB allows you to create and visualize models, and deploy models to servers and embedded devices without being an expert. MATLAB also enables users to automatically generate high performance CUDA code for DL and vision applications from MATLAB code.

MATLAB on NGC MATLAB for deep learning

Model Deployment

For high performance inference deployment of MATLAB trained models, use MATLAB GPU Coder to automatically generate TensorRT-optimized inference engines from cloud to embedded deployment environments.