NVIDIA EGX Stack

From the enterprise to the edge, the NVIDIA EGX™ stack delivers a cloud-native platform for GPU-accelerated machine learning, deep learning, and high-performance computing (HPC). Use the EGX stack to quickly and painlessly run GPU-optimized NGC™ containers on NVIDIA-Certified servers.

Get Started


The NVIDIA EGX stack is an optimized software stack that includes NVIDIA drivers, a Kubernetes plug-in, a container runtime, and containerized AI frameworks and applications, including NVIDIA® TensorRT™, NVIDIA Triton™ Inference Server, and the NVIDIA DeepStream SDK. The EGX stack is optimized for NVIDIA-Certified systems. Helm charts for installation are available in the NVIDIA NGC catalog.


Cloud Native

Cloud-native technologies include microservices, containerization, and declarative automation. The EGX stack is built with cloud-native technologies and is engineered to run GPU-optimized NGC containers, enabling developer-focused workflows and software-lifecycle agility.

Open Source

Open-source software fuels technology innovation. NVIDIA contributes to open-source projects and communities, including container runtimes, Kubernetes extensions, and monitoring tools. Open-source software is the foundation for the EGX stack.

Performance and Scale

From edge to enterprise, the EGX stack delivers performance at scale. It supports NVIDIA Ampere A100 Tensor Core GPUs for best performance, including the EGX converged accelerators that enable performance to scale across multiple nodes.

Robust Ecosystem

NVIDIA-Certified systems pass an extensive suite of tests that validate their ability to deliver high performance running NGC containers. NVIDIA-Certified edge servers pass additional tests for remote management and security.




NVIDIA GPU Operator

The NVIDIA GPU Operator uses the Kubernetes operator framework to automate the management of all NVIDIA software components needed to provision GPUs. These components include NVIDIA drivers to enable CUDA®, a Kubernetes device plug-in for GPUs, the NVIDIA container runtime, automatic node labelling, and a NVIDIA Data Center GPU Manager (DCGM)-based monitoring agent.

Learn More



NVIDIA Network Operator

The NVIDIA Network Operator leverages Kubernetes custom resource definition (CRD) features and Operator SDK to enable fast networking, remote direct memory access (RDMA) and GPUDirect® in a Kubernetes cluster. It supports RDMA-shared devices, the Mellanox Kubernetes device plug-in, and the GPUDirect RDMA NVIDIA peer memory driver.

Learn More



EGX Partners

The EGX stack architecture is supported by leading hybrid-cloud platform partners, ensuring that these clusters are configured for GPU workloads and perform in an optimal, consistent fashion.





Resources

Build cloud-native applications on the EGX stack.

Get Started