Data Center / Cloud

Introducing NVIDIA DGX Cloud Lepton: A Unified AI Platform Built for Developers

Jun 11, 2025

By Janisha Anand and Sowmyan Soman

Discuss (0)

AI-Generated Summary

Dislike

NVIDIA DGX Cloud Lepton is a unified AI platform that connects developers to tens of thousands of GPUs from a global network of cloud providers, helping to accelerate AI developer productivity.
The platform allows developers to build, train, and deploy AI applications quickly and at scale, with features such as simplified GPU discovery, consistent development environments, and streamlined multi-cloud administration.
Global cloud providers, including Amazon Web Services and Hugging Face, have integrated NVIDIA DGX Cloud Lepton into their services, providing developers with flexible access to compute resources.

AI-generated content may summarize information incompletely. Verify important information. Learn more

The age of AI-native applications has arrived. Developers are building advanced agentic and physical AI systems—but scaling across geographies and GPU providers remains a challenge.

NVIDIA built DGX Cloud Lepton to help. It’s a unified AI platform and compute marketplace that connects developers to tens of thousands of GPUs from a global network of cloud providers. And it’s now available for early access.

DGX Cloud Lepton addresses a critical need: accelerating AI developer productivity by providing unified access to GPU capacity and AI services across the NVIDIA compute ecosystem. It integrates seamlessly with the NVIDIA software stack—including NVIDIA NIM and NVIDIA NeMo—and will soon support NVIDIA Blueprints and NVIDIA Cloud Functions (NVCF). It enables developers to build, train, and deploy AI applications quickly and at scale.

Developers can kickstart AI development using build.nvidia.com with instant access to NVIDIA NIM microservices and prebuilt workflows. When it’s time to scale training, fine-tuning, or inference across geographies and providers, NVIDIA DGX Cloud Lepton provides production-grade compute infrastructure and orchestration.

Global cloud providers—including Amazon Web Services, Firebird, Fluidstack, Mistral AI, Nebius, Nscale, Scaleway, and Together AI —have made NVIDIA Blackwell and other GPUs available in DGX Cloud Lepton. Additionally, Hugging Face plans to integrate DGX Cloud Lepton into its Training Cluster as a Service to expand AI researchers’ access to scalable compute for model training. The companies join existing partners such as CoreWeave, Crusoe, Firmus, Foxconn, GMI Cloud, Lambda, and Yotta Data Services. Watch for more soon.

Developers can access compute resources through bring-your-own-capacity options from partners. This flexibility supports sovereign AI initiatives and strategic data locality requirements.

This post explains how NVIDIA DGX Cloud Lepton empowers developers to build, and scale AI applications seamlessly, leveraging compute from multiple cloud providers.

Key benefits for developers

Whether you’re training large language models or serving real-time inference, DGX Cloud Lepton is designed to help you spend less time managing infrastructure—and more time building.

1. Simplified GPU discovery: Discover and allocate GPU resources across cloud providers through a single platform. Determine optimal workload placement based on region, cost, and performance while standardizing on familiar AI tooling.

2. Consistent development environments: Work in a standardized development environment, regardless of underlying infrastructure.

3. Streamlined multi-cloud administration: DGX Cloud Lepton reduces operational silos and friction, enabling seamless administration and scaling across multiple cloud providers.

4. Multi-region and data sovereignty support: Access GPUs in specific regions to meet data residency requirements. Increase performance and reduce latency by deploying workloads with close proximity to application consumers.

5. Built-in reliability and resilience: DGX Cloud Lepton leverages GPUd for continuous GPU health monitoring, intelligent workload scheduling, and fault isolation to ensure stable and predictable performance.

DGX Cloud Lepton features

Core capabilities

Dev pods: Dev pods support interactive AI/ML development through Jupyter notebooks, SSH, and Visual Studio Code. They are ideal for prototyping, debugging, and iterative model experimentation.
Batch jobs: Batch jobs are suited for running large-scale, non-interactive workloads such as model training and data preprocessing across multiple nodes. You can specify CPU, GPU, and memory requirements; select node groups; and monitor performance through real-time metrics like GPU utilization, memory consumption, and GPU temperature. Each job provides detailed status and host-level visibility for every replica.
Inference endpoints: You can deploy and manage a wide range of models, including base models, fine-tuned models, and custom-built models. Inference endpoints can support NVIDIA NIM or bring your own container, offering flexible deployment options. The system automatically scales model replicas based on demand to ensure high availability and performance. Built-in health monitoring and resilience features reduce downtime and ensure reliable operation.

Monitoring and observability

Health monitoring: Continuously monitor GPU and system health in real time with advanced diagnostics, including GPUd, NCCL benchmarks, and proactive alerts to identify issues. All nodes undergo rigorous validation, such as NCCL testing and GPU burn-in, ensuring they meet performance and reliability standards. The platform automatically isolates unhealthy nodes from the scheduler to prevent disruption, while real-time telemetry and customizable auto-recovery workflows maintain operational stability and workload resilience.

Custom workspace settings: Easily configure quotas, access controls, secrets management, billing settings, and container registries to meet enterprise requirements.
Observability tools: Stream logs in real time, manage job lifecycles, and securely inspect API activity on a per-user basis to maintain visibility and operational control across the platform.

Getting started with DGX Cloud Lepton

You get a consistent experience across web user interfaces, command-line interfaces, and SDKs—whether you’re prototyping or deploying in production. Once onboarded, each customer receives a workspace, a secure environment to manage GPU resources and run workloads

Admins configure settings such as user access controls, secrets, container registries, and usage quotas. GPU resources are placed into node groups, which serve as the foundation for compute workloads.

You can then:

Launch dev pods for interactive development
Submit batch jobs for model training or data processing
Deploy inference endpoints for real-time or batch model serving

DGX Cloud Lepton streamlines the deployment of containerized AI and machine-learning workloads. It allows you to bring your own workloads as container images, with support for any OCI-compliant container registry, including the NVIDIA NGC container registry.

To learn more, refer to the documentation.

Updated Jan. 8, 2026, to remove information about the early access program.

Discuss (0)

About the Authors

About Janisha Anand
Janisha leads product development for the DGX Cloud Lepton platform, delivering a seamless, end-to-end experience for developers and data scientists. Her work encompasses the entire model development and deployment lifecycle—ranging from functional capabilities like data preparation, pretraining, fine-tuning, and inference, to platform-level features such as job orchestration, performance optimization, and observability across large-scale GPU clusters.

View all posts by Janisha Anand

About Sowmyan Soman
Sowmyan is currently working as a principal cloud integration architect at NVIDIA, focusing on DGX Cloud Lepton integrations with GPU providers to enable large-scale AI infrastructure. His technology interests center around Kubernetes, AI/ML model deployment, GPU computing, and optimizing inference and training workloads in the cloud. Sowmyan is particularly passionate about enabling enterprise-grade AI solutions through scalable, secure, and efficient cloud-native architectures. He holds a degree in Information technology and has over 15 years of experience in cloud and AI infrastructure.

View all posts by Sowmyan Soman