Simulation / Modeling / Design

Enabling GPUs in the Container Runtime Ecosystem

NVIDIA GPU Cloud Deep learning Containers

dgx docker

NVIDIA uses containers to develop, test, benchmark, and deploy deep learning (DL) frameworks and HPC applications. We wrote about building and deploying GPU containers at scale using NVIDIA-Docker roughly two years ago. Since then,  NVIDIA-Docker has been downloaded close to 2 million times. A variety of customers used NVIDIA-Docker to containerize and run GPU accelerated workloads.

NVIDIA offers GPU accelerated containers via NVIDIA GPU Cloud (NGC) for use on DGX systems, public cloud infrastructure, and even local workstations with GPUs. NVIDIA-Docker has been the critical underlying technology for these initiatives.

The adoption of container technologies other than Docker for an ever-evolving set of use cases for DL and HPC workloads among others led us to fundamentally rethink our existing NVIDIA-Docker architecture.  Our primary goal pursued extensibility across not only various container runtimes but also container orchestration systems.

The NVIDIA Container Runtime introduced here is our next-generation GPU-aware container runtime. It is compatible with the Open Containers Initiative (OCI) specification used by Docker, CRI-O, and other popular container technologies.

You’ll learn about the NVIDIA Container Runtime components and how it can be extended to support multiple container technologies. Let’s examine the architecture and benefits of the new runtime, showcase some of the new features, and walk through some examples of deploying GPU accelerated applications using Docker and LXC.

NVIDIA Container Runtime

NVIDIA designed NVIDIA-Docker in 2016 to enable portability in Docker images that leverage NVIDIA GPUs. It allowed driver agnostic CUDA images and provided a Docker command line wrapper that mounted the user mode components of the driver and the GPU device files into the container at launch.

Over the lifecycle of NVIDIA-Docker, we realized the architecture lacked flexibility for a few reasons:

  • Tight integration with Docker did not allow support of other container technologies such as LXC, CRI-O, and other runtimes in the future
  • We wanted to leverage other tools in the Docker ecosystem – e.g. Compose (for managing applications that are composed of multiple containers)
  • Support GPUs as a first-class resource in orchestrators such as Kubernetes and Swarm
  • Improve container runtime support for GPUs – esp. automatic detection of user-level NVIDIA driver libraries, NVIDIA kernel modules, device ordering, compatibility checks and GPU features such as graphics, video acceleration

As a result, the redesigned NVIDIA-Docker moved the core runtime support for GPUs into a  library called libnvidia-container. The library relies on Linux kernel primitives and is agnostic relative to the higher container runtime layers. This allows easy extension of GPU support into different container runtimes such as Docker, LXC and CRI-O. The library includes a command-line utility and also provides an API for integration into other runtimes in the future. The library, tools, and the layers we built to integrate into various runtimes are collectively called the NVIDIA Container Runtime.

In the next few sections, you’ll learn about the integration into both Docker and LXC.

Support for Docker

Before diving into NVIDIA Container Runtime integration with Docker, let’s briefly look at how the Docker platform has evolved.

Since 2015, Docker has been donating key components of its container platform, starting with the Open Containers Initiative (OCI) specification and an implementation of the specification of a lightweight container runtime called runc. In late 2016, Docker also donated containerd, a daemon which manages the container lifecycle and wraps OCI/runc. The containerd daemon handles transfer of images, execution of containers (with runc), storage, and network  management. It is designed to be embedded into larger systems such as Docker. More information on the project is available on the official site.

Figure 1 shows how the libnvidia-container integrates into Docker, specifically at the runc layer. We use a custom OCI prestart hook called nvidia-container-runtime-hook to runc in order to enable GPU containers in Docker (more information about hooks can be found in the OCI runtime spec). The addition of the prestart hook to runc requires us to register a new OCI compatible runtime with Docker (using the –runtime option). At container creation time, the prestart hook checks whether the container is GPU-enabled (using environment variables) and uses the container runtime library to expose the NVIDIA GPUs to the container.

NVIDIA Container Runtime Docker GPU
Figure 1. Integration of NVIDIA Container Runtime with Docker

Integration at the runc layer also allows flexibility to support other OCI runtimes such as CRI-O. Version 1.1 of containerd added support for the Container Runtime Interface (CRI) in Kubernetes; last week Kubernetes announced the general availability of the containerd integration via the CRI plugin. The new architecture of the NVIDIA runtime can easily support either choice of runtime with Kubernetes. This level of flexibility is important as we work closely with the community to enable first-class GPU support in Kubernetes.

Environment Variables

The NVIDIA Container Runtime uses environment variables in container images to specify a GPU accelerated container.

  1. NVIDIA_VISIBLE_DEVICES : controls which GPUs will be accessible inside the container. By default, all GPUs are accessible to the container.
  2. NVIDIA_DRIVER_CAPABILITIES : controls which driver features (e.g. compute, graphics) are exposed to the container.
  3. NVIDIA_REQUIRE_* : a logical expression to define the constraints (e.g. minimum CUDA, driver or compute capability) on the configurations supported by the container.

If no environment variables are detected (either on the Docker command line or in the image), the default runc is used. You can find more information on these environment variables in the NVIDIA Container Runtime documentation. These environment variables are already set in the official CUDA containers from NVIDIA.


Your system must satisfy the following prerequisites to begin using NVIDIA Container Runtime with Docker.

  1. Supported version of Docker for your distribution. Follow the official instructions from Docker.
  2. The latest NVIDIA driver. Use the package manager to install the cuda-drivers package or use the installer from the driver downloads site. Note that using the cuda-drivers package may not work on Ubuntu 18.04 LTS systems.

To get started using the NVIDIA Container Runtime with Docker, either use the nvidia-docker2 installer packages or manually setup the runtime with Docker Engine. The nvidia-docker2 package includes a custom daemon.json file to register the NVIDIA runtime as the default with Docker and a script for backwards compatibility with nvidia-docker 1.0.

If you have nvidia-docker 1.0 installed, you need to remove it and any existing GPU containers before installing the NVIDIA runtime. Note that the following installation steps apply to Debian distributions and their derivatives.

$ docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f

$ sudo apt-get purge -y nvidia-docker

Now, let’s add the package repositories and refresh the package index.

$ curl -s -L | \
sudo apt-key add -

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update

Then install the various components using the nvidia-docker2 package and reload the Docker daemon configuration.

$ sudo apt-get install -y nvidia-docker2

$ sudo pkill -SIGHUP dockerd

Run the following command line utility (CLI) to verify that NVIDIA driver and runtime have installed correctly on your system (provided as part of the installer packages). The runtime CLI provides information on the driver and devices detected in the system. In this example, the runtime library has correctly detected and enumerated 4 NVIDIA Tesla V100s in the system.

$ sudo nvidia-container-cli --load-kmods info
NVRM version:   396.26
CUDA version:   9.2

Device Index:   0
Device Minor:   2
Model:              Tesla V100-SXM2-16GB
GPU UUID:           GPU-e354d47d-0b3e-4128-74bf-f1583d34af0e
Bus Location:   00000000:00:1b.0
Architecture:   7.0

Device Index:   1
Device Minor:   0
Model:              Tesla V100-SXM2-16GB
GPU UUID:           GPU-716346f4-da29-392a-c4ee-b9840ec2f2e9
Bus Location:   00000000:00:1c.0
Architecture:   7.0

Device Index:   2
Device Minor:   3
Model:              Tesla V100-SXM2-16GB
GPU UUID:           GPU-9676587f-b418-ee6b-15ac-38470e1278fb
Bus Location:   00000000:00:1d.0
Architecture:   7.0

Device Index:   3
Device Minor:   2
Model:              Tesla V100-SXM2-16GB
GPU UUID:           GPU-2370332b-9181-d6f5-1f24-59d66fc7a87e
Bus Location:   00000000:00:1e.0
Architecture:   7.0

The CUDA version detected by nvidia-container-cli verifies whether the NVIDIA driver installed on your host is sufficient to run a container based on a specific CUDA version. If an incompatibility exists, the runtime will not start the container. More information on compatibility and minimum driver requirements for CUDA is available here.

Now, let’s try running a GPU container with Docker. This example pulls the NVIDIA CUDA container available on the Docker Hub repository and runs the nvidia-smi command inside the container.

$ sudo docker run --rm --runtime=nvidia -ti nvidia/cuda
root@d6c41b66c3b4:/# nvidia-smi
Sun May 20 22:06:13 2018
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla V100-SXM2...  On   | 00000000:00:1B.0 Off |                  Off |
| N/A   41C    P0    34W / 300W |      0MiB / 16160MiB |      0%      Default |
|   1  Tesla V100-SXM2...  On   | 00000000:00:1C.0 Off |                  Off |
| N/A   39C    P0    35W / 300W |      0MiB / 16160MiB |      0%      Default |
|   2  Tesla V100-SXM2...  On   | 00000000:00:1D.0 Off |                  Off |
| N/A   39C    P0    38W / 300W |      0MiB / 16160MiB |      0%      Default |
|   3  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    38W / 300W |      0MiB / 16160MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |

Running GPU Containers

Let’s now look at some examples of running more complex GPU applications. NVIDIA offers a variety of pre-built containers for deep learning and HPC on the NGC registry.

Deep Learning Framework Container

This example trains a deep neural network using the PyTorch deep learning framework container available from NGC. You’ll need to open a free NGC account to access the latest deep learning framework and HPC containers. The NGC documentation outlines the steps required to get started.

This example uses the NVIDIA_VISIBLE_DEVICES variable, to expose only two GPUs to the container.

$ sudo docker run -it --runtime=nvidia --shm-size=1g -e NVIDIA_VISIBLE_DEVICES=0,1 --rm

Copyright (c) 2006          Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)

All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

Run the nvidia-smi command inside the container to verify only two GPUs are visible.

root@45cebefa1480:/workspace# nvidia-smi
Mon May 28 07:15:39 2018

| NVIDIA-SMI 396.26                     Driver Version: 396.26                |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla V100-SXM2...  On   | 00000000:00:1B.0 Off |                   0  |
| N/A   39C     P0   36W / 300W |      0MiB / 16160MiB |      0%     Default  |
|   1  Tesla V100-SXM2...  On   | 00000000:00:1C.0 Off |                   0  |
| N/A  38C    P0     35W / 300W |      0MiB / 16160MiB |      0%     Default  |

| Processes:                                                       GPU Memory |
|  GPU           PID   Type   Process name                         Usage      |
|  No running processes found                                                 |


Try running the MNIST training example included with the container:

root@45cebefa1480:/workspace/examples/mnist# python
Done! UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
return F.log_softmax(x) UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
100. * batch_idx / len(train_loader),[0]))
Train Epoch: 1 [0/60000 (0%)]   Loss: 2.373651
Train Epoch: 1 [640/60000 (1%)] Loss: 2.310517
Train Epoch: 1 [1280/60000 (2%)]            Loss: 2.281828
Train Epoch: 1 [1920/60000 (3%)]            Loss: 2.315808
Train Epoch: 1 [2560/60000 (4%)]            Loss: 2.235439
Train Epoch: 1 [3200/60000 (5%)]            Loss: 2.234249
Train Epoch: 1 [3840/60000 (6%)]            Loss: 2.226109
Train Epoch: 1 [4480/60000 (7%)]            Loss: 2.228646
Train Epoch: 1 [5120/60000 (9%)]            Loss: 2.132811

OpenGL Graphics Container

As discussed in the previous sections, the NVIDIA Container Runtime now provides support for running OpenGL and EGL applications. The next example builds and runs the N-body simulation using OpenGL. Use the sample Dockerfile available on NVIDIA GitLab to build the container.

Copy the Dockerfile and build the N-body sample

$ docker build -t nbody .

Allow the root user to access the running X server

$ xhost +si:localuser:root

Run the N-body sample

$ sudo docker run --runtime=nvidia -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix nbody
N-body CUDA OpenGL Docker Container
Figure 2. Running the N-body CUDA / OpenGL sample with Docker

Docker Compose

The final example uses Docker Compose to show how easy it can be to launch multiple GPU containers with the NVIDIA Container Runtime. The example will launch 3 containers – the N-body sample with OpenGL, an EGL sample (peglgears from Mesa) and a simple container that runs the nvidia-smi command.

Install Docker Compose

$ sudo curl -L`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose

Clone the samples available from NVIDIA Gitlab

$ git clone

Write a docker-compose.yml to specify the three containers and the environments. Copy the following using a text editor of your choice:

version: '2.3'

          build: samples/cudagl/ubuntu16.04/nbody 
          runtime: nvidia
               - DISPLAY
               - /tmp/.X11-unix:/tmp/.X11-unix
          build: samples/opengl/ubuntu16.04/peglgears
          runtime: nvidia

          image: ubuntu:18.04
          runtime: nvidia
               - NVIDIA_VISIBLE_DEVICES=all
          command: nvidia-smi

Allow the root user to access the running X server (for the N-body sample)

$ xhost +si:localuser:root

Finally, start the containers

$ sudo docker-compose up

Your console output may appear as below

Building nbody
Step 1/6 : FROM nvidia/cudagl:9.0-base-ubuntu16.04
---> b6055709073e
---> Using cache
---> ebd1c003a592
Step 3/6 : RUN apt-get update && apt-get install -y --no-install-recommends             cuda-samples-$CUDA_PKG_VERSION &&         rm -rf /var/lib/apt/lists/*
---> Using cache
---> 1987dc2c1bbc
Step 4/6 : WORKDIR /usr/local/cuda/samples/5_Simulations/nbody
---> Using cache
---> de7af4fbb03e
Step 5/6 : RUN make
---> Using cache
---> a6bcfb9a4958
Step 6/6 : CMD ./nbody
---> Using cache
---> 9c11a1e93ef2
Successfully built 9c11a1e93ef2
Successfully tagged ubuntu_nbody:latest
WARNING: Image for service nbody was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Starting ubuntu_nbody_1         ... done
Starting ubuntu_nvsmi_1         ... done
Starting ubuntu_peglgears_1 ... done
Attaching to ubuntu_nvsmi_1, ubuntu_peglgears_1, ubuntu_nbody_1
ubuntu_nvsmi_1 exited with code 0
peglgears_1  | peglgears: EGL version = 1.4
peglgears_1  | peglgears: EGL_VENDOR = NVIDIA
peglgears_1  | 246404 frames in 5.0 seconds = 49280.703 FPS
ubuntu_peglgears_1 exited with code 0

Support for GPU Containers with LXC

Linux Containers (or LXC) is an OS-level virtualization tool for creating and managing system or application containers. Early releases of Docker used LXC as the underlying container runtime technology. LXC offers an advanced set of tools to manage containers (e.g. templates, storage options, passthrough devices, autostart etc.) and offers the user a lot of control. In the references, We have provided a link to a GTC 2018 talk on LXC by engineers from Canonical and Cisco in the references at the end of this post.

LXC supports unprivileged containers (using the user namespaces feature in the Linux kernel). This becomes great advantage in the context of deployment of containers in HPC environments, where users may not have administrative rights to run containers. LXC also supports import of Docker images and we will explore an example in more detail below.

NVIDIA continues to work closely with the LXC community on upstreaming patches to add GPU support. LXC 3.0.0 released in early April includes support for GPUs using the NVIDIA runtime. For more information and a demo, see this news post from Canonical.

Figure 2 shows how the container runtime library (libnvidia-container) integrates into LXC.

NVIDIA Container Runtime LXC
Figure 3. Integration of NVIDIA Container Runtime with LXC

Let’s look at running a simple CUDA container with LXC. This example shows how the default LXC OCI template can be used to create an application container from OCI images such as those available on Docker Hub (using tools such as skopeo and umoci).

First, lets setup the repositories for the tools:

$ sudo add-apt-repository ppa:ubuntu-lxc/lxc-stable
$ sudo apt-add-repository ppa:projectatomic/ppa

Install LXC and dependent tools such as skopeo:

$ apt-get install libpam-cgfs lxc-utils lxcfs lxc-templates skopeo skopeo-containers jq libnvidia-container-tools

Setup umoci:

$ sudo curl -fsSL -o /usr/local/bin/umoci
$ sudo chmod ugo+rx /usr/local/bin/umoci

Setup user, group ids and virtual ethernet interfaces for each user. Refer to the LXC documentation on creating unprivileged containers. The sample scripts are provided here for convenience.

$ sudo curl -fsSL -o /usr/local/bin/generate-lxc-perms
$ sudo chmod ugo+rx /usr/local/bin/generate-lxc-perms

$ sudo curl -fsSL -o /usr/local/bin/generate-lxc-config
$ sudo chmod ugo+rx /usr/local/bin/generate-lxc-config

Now, setup GPU support for every container:

$ sudo tee /usr/share/lxc/config/common.conf.d/nvidia.conf <<< 'lxc.hook.mount = /usr/share/lxc/hooks/nvidia'
$ sudo chmod ugo+r /usr/share/lxc/config/common.conf.d/nvidia.conf

As a one-time setup, setup the permissions and configuration as a regular user:

$ sudo generate-lxc-perms
$ generate-lxc-config

Use lxc-create to download and create a CUDA application container from the CUDA image available on NVIDIA’s Docker Hub repository.

$ lxc-create -t oci cuda -- -u docker://nvidia/cuda
Getting image source signatures
Copying blob sha256:297061f60c367c17cfd016c97a8cb24f5308db2c913def0f85d7a6848c0a17fa
41.03 MB / 41.03 MB [======================================================] 0s
Copying blob sha256:e9ccef17b516e916aa8abe7817876211000c27150b908bdffcdeeba938cd004c
850 B / 850 B [============================================================] 0s
Copying blob sha256:dbc33716854d9e2ef2de9769422f498f5320ffa41cb79336e7a88fbb6c3ef844
621 B / 621 B [============================================================] 0s
Copying blob sha256:8fe36b178d25214195af42254bc7d5d64a269f654ef8801bbeb0b6a70a618353
851 B / 851 B [============================================================] 0s
Copying blob sha256:686596545a94a0f0bf822e442cfd28fbd8a769f28e5f4018d7c24576dc6c3aac
169 B / 169 B [============================================================] 0s
Copying blob sha256:aa76f513fc89f79bec0efef655267642eba8deac019f4f3b48d2cc34c917d853
6.65 MB / 6.65 MB [========================================================] 0s
Copying blob sha256:c92f47f1bcde5f85cde0d7e0d9e0caba6b1c9fcc4300ff3e5f151ff267865fb9
397.29 KB / 397.29 KB [====================================================] 0s
Copying blob sha256:172daef71cc32a96c15d978fb01c34e43f33f05d8015816817cc7d4466546935
182 B / 182 B [============================================================] 0s
Copying blob sha256:e282ce84267da687f11d354cdcc39e2caf014617e30f9fb13f7711c7a93fb414
449.41 MB / 449.41 MB [====================================================] 8s
Copying blob sha256:91cebab434dc455c4a9faad8894711a79329ed61cc3c08322285ef20599b4c5e
379.37 MB / 552.87 MB [=====================================>-----------------]
Writing manifest to image destination
Storing signatures
Unpacking the rootfs
     • rootless{dev/agpgart} creating empty file in place of device 10:175
     • rootless{dev/audio} creating empty file in place of device 14:4
     • rootless{dev/audio1} creating empty file in place of device 14:20

As a regular user, we can run the nvidia-smi inside the container:

$ lxc-execute cuda

root@cuda:/# nvidia-smi
Mon May 28 21:48:57 2018
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla V100-SXM2...  On   | 00000000:00:1B.0 Off |                    0 |
| N/A   40C    P0    36W / 300W |      0MiB / 16160MiB |      0%      Default |
|   1  Tesla V100-SXM2...  On   | 00000000:00:1C.0 Off |                    0 |
| N/A   39C    P0    35W / 300W |      0MiB / 16160MiB |      0%      Default |
|   2  Tesla V100-SXM2...  On   | 00000000:00:1D.0 Off |                    0 |
| N/A   39C    P0    38W / 300W |      0MiB / 16160MiB |      1%      Default |
|   3  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   40C    P0    38W / 300W |      0MiB / 16160MiB |      1%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |


This post covers the NVIDIA Container Runtime and how it can be easily integrated into the container runtime and orchestration ecosystem to enable GPU support. Get started with building and running GPU containers with it today! Installer packages are available for a variety of Linux distributions. Nvidia-Docker 1.0 is deprecated and is no longer actively supported. We strongly encourage users to upgrade to the new NVIDIA runtime when using Docker. The future roadmap includes a number of exciting features including support for Vulkan, CUDA MPS, containerized drivers and much more.

If you are running containers on public cloud service providers such as Amazon AWS or Google Cloud, NVIDIA offers virtual machine images that include all the components you need, including the NVIDIA Container Runtime to get started.

If you have questions or comments please leave them below in the comments section. For technical questions about installation and usage, we recommend starting a discussion on the NVIDIA Accelerated Computing forum.


[1] Watch a 3-part series on installing NVIDIA Container Runtime and using it with NGC containers (

[2] Using Container for GPU Workloads (GTC 2018 talk on LXC)

[3] Frequently asked questions are available in the documentation

Discuss (12)