Kubernetes on NVIDIA GPUs

Scale-out GPU-Accelerated Applications

Kubernetes on NVIDIA GPUs enables enterprises to scale up training and inference deployment to multi-cloud GPU clusters seamlessly. It lets you automate the deployment, maintenance, scheduling and operation of multiple GPU accelerated application containers across clusters of nodes.

With increasing number of AI powered applications and services and the broad availability of GPUs in public cloud, there is a need for open-source Kubernetes to be GPU-aware. With Kubernetes on NVIDIA GPUs, software developers and DevOps engineers can build and deploy GPU-accelerated deep learning training or inference applications to heterogeneous GPU clusters at scale, seamlessly.

(Click to Zoom)

Get Started >

Learn More

Benefits

Kubernetes on NVIDIA GPUs extends the industry standard container orchestration platform with GPU acceleration capabilities. With first class support for GPU resources scheduling, developers and DevOps engineers can now build, deploy, orchestrate and monitor GPU-accelerated application deployments on heterogeneous, multi-cloud clusters.

Simplify large scale deployments of GPU-accelerated applications

Orchestrate deep learning and HPC applications on heterogeneous GPU clusters, with easy-to-specify attributes such as GPU type and memory requirement.

Maximize GPU cluster utilization with platform monitoring

Analyze and improve GPU utilization on clusters with integrated metrics and monitoring capabilities. Identify power inefficiencies and other issues to implement application logic that ensures maximum GPU utilization.

Tested, validated and maintained by NVIDIA

Kubernetes on NVIDIA GPUs has been tested and qualified on all NVIDIA DGX systems (DGX-1 Pascal, DGX-1 Volta, DGX Station), and NVIDIA Tesla GPUs in public cloud for worry-free deployments of AI workloads.

Availability

Release candidate of Kubernetes on NVIDIA GPUs is now freely available for testing.

Release candidate Install Instructions >

Kubernetes on NVIDIA GPUs source on GitHub >

NVIDIA is developing GPU enhancements to open-source Kubernetes and is working closely with the Kubernetes community to contribute GPU enhancements for the benefit of the larger ecosystem. Since NVIDIA is iterating faster than upstream Kubernetes releases, these enhancements are being made available immediately as NVIDIA provided installers and source-code.

If you’d like to be notified when new releases are available for download, please sign up for the interest list below. To file bugs and enhancement requests, sign up for the free NVIDIA registered developers program and navigate to the My account > My bugs.

INTEREST LIST SIGN-UP

Key Features

  • Enables GPU support in Kubernetes using the NVIDIA device plugin
  • Specify GPU attributes such as GPU type and memory requirements for deployment in heterogeneous GPU clusters
  • Visualize and monitor GPU metrics and health with an integrated GPU monitoring stack of NVIDIA DCGM , Prometheus and Grafana
  • Support for multiple underlying container runtimes such as Docker and CRI-O
  • Officially supported on all NVIDIA DGX systems (DGX-1 Pascal, DGX-1 Volta and DGX Station)