NVIDIA Optimized Framework Containers FAQs

The NVIDIA NGC catalog contains a host of GPU-optimized containers for deep learning, machine learning, visualization, and high-performance computing (HPC) applications tested for correctness, functionality, performance, security, and scalability. NVIDIA Optimized DL Framework Containers are available as Docker images for training and inference with PyTorch, JAX, TensorFlow, PaddlePaddle, Deep Graph Library (DGL), and PyTorch Geometric (PyG).

Building and maintaining DL frameworks is complex due to rapid updates and the need for optimization across GPU architectures. NVIDIA addresses these challenges by providing DL framework containers that are regularly updated with the latest software libraries, frameworks, and driver versions. These containers are meticulously tested for compatibility and security, ensuring a stable and validated software stack for DL teams.

Using NVIDIA Optimized DL Framework Containers reduces the burden on operations and infrastructure teams, accelerates the development of DL products, and ensures that DL teams are using the same validated software stack used in NVIDIA. This includes handling the usage, versions, and integration of all NVIDIA software components, facilitating a seamless and efficient development process. For additional details on compatibility with the software and hardware we test against, see Frameworks Support Matrix - NVIDIA Docs.

The NVIDIA PyTorch team is responsible for both upstream PyTorch development and our monthly container releases. We actively contribute code, optimizations, and fixes to the PyTorch GitHub repository as official PyTorch Maintainers, working closely with the broader PyTorch community. Our team conducts rigorous nightly testing to ensure PyTorch remains in a consistently healthy state. This ongoing testing allows us to quickly identify and address any issues that may arise, maintaining the framework's stability and performance.

In addition to our upstream contributions, we release monthly NVIDIA Optimized PyTorch Containers. These containers include the latest performance enhancements, security fixes, and optimizations, often ahead of official PyTorch releases. Since upstream stable versions are updated less frequently, we carefully select a specific PyTorch commit (e.g., 2.4.0a0+f70bd71a48 for version 24.06) that ensures the best correctness, functionality, and security. This approach allows PyTorch users to access the latest improvements without waiting months for an upstream stable release. Our extensive testing across a wide range of NVIDIA hardware ensures that these monthly releases maintain the same or better quality as stable upstream releases.

With our comprehensive testing and more frequent production-quality releases, developers and researchers benefit from the latest advancements in PyTorch.

NVIDIA Optimized DL Framework Containers are built, tested, and optimized to be used across all cloud providers and on-premises. The containers are tested across a variety of GPU platforms, from a single GPU workstation, DGX A100 server to a DGX SuperPOD cluster.

You can access Docker images for NVIDIA Optimized DL Framework Containers from NGC Catalog. For more information, see NGC Catalog for a list of available images.

You can customize the containers by following our guides and examples in Containers For Deep Learning Frameworks User Guide - NVIDIA Docs.

Since we include a custom commit of torch in our containers, installing torchaudio would require a source build. You can find an example of how we add torchaudio to our NeMo containers which are built on top of our PyTorch containers. Alternatively, you can also use the NeMo container which includes both torch and torchaudio.

While we don’t distribute a Dockerfile, you can find a list of all the important libraries and their versions in our DL framework containers in Frameworks Support Matrix - NVIDIA Docs. For JAX, we provide open Dockerfiles in JAX Toolbox that could be modified for your own DL container images.