The age of AI-native applications has arrived. Developers are building advanced agentic and physical AI systems—but scaling across geographies and GPU providers remains a challenge.
NVIDIA built DGX Cloud Lepton to help. It’s a unified AI platform and compute marketplace that connects developers to tens of thousands of GPUs from a global network of cloud providers. And it’s now available for early access.
DGX Cloud Lepton addresses a critical need: accelerating AI developer productivity by providing unified access to GPU capacity and AI services across the NVIDIA compute ecosystem. It integrates seamlessly with the NVIDIA software stack—including NVIDIA NIM and NVIDIA NeMo—and will soon support NVIDIA Blueprints and NVIDIA Cloud Functions (NVCF). It enables developers to build, train, and deploy AI applications quickly and at scale.
Developers can kickstart AI development using build.nvidia.com with instant access to NVIDIA NIM microservices and prebuilt workflows. When it’s time to scale training, fine-tuning, or inference across geographies and providers, NVIDIA DGX Cloud Lepton provides production-grade compute infrastructure and orchestration.
Global cloud providers—including Amazon Web Services, Firebird, Fluidstack, Mistral AI, Nebius, Nscale, Scaleway, and Together AI —have made NVIDIA Blackwell and other GPUs available in DGX Cloud Lepton. Additionally, Hugging Face plans to integrate DGX Cloud Lepton into its Training Cluster as a Service to expand AI researchers’ access to scalable compute for model training. The companies join existing partners such as CoreWeave, Crusoe, Firmus, Foxconn, GMI Cloud, Lambda, and Yotta Data Services. Watch for more soon.
Developers can access compute resources through bring-your-own-capacity options from partners. This flexibility supports sovereign AI initiatives and strategic data locality requirements.
This post explains how NVIDIA DGX Cloud Lepton empowers developers to build, and scale AI applications seamlessly, leveraging compute from multiple cloud providers.

Key benefits for developers
Whether you’re training large language models or serving real-time inference, DGX Cloud Lepton is designed to help you spend less time managing infrastructure—and more time building.
1. Simplified GPU discovery: Discover and allocate GPU resources across cloud providers through a single platform. Determine optimal workload placement based on region, cost, and performance while standardizing on familiar AI tooling.
2. Consistent development environments: Work in a standardized development environment, regardless of underlying infrastructure.
3. Streamlined multi-cloud administration: DGX Cloud Lepton reduces operational silos and friction, enabling seamless administration and scaling across multiple cloud providers.
4. Multi-region and data sovereignty support: Access GPUs in specific regions to meet data residency requirements. Increase performance and reduce latency by deploying workloads with close proximity to application consumers.
5. Built-in reliability and resilience: DGX Cloud Lepton leverages GPUd for continuous GPU health monitoring, intelligent workload scheduling, and fault isolation to ensure stable and predictable performance.
DGX Cloud Lepton features
Core capabilities
- Dev pods: Dev pods support interactive AI/ML development through Jupyter notebooks, SSH, and Visual Studio Code. They are ideal for prototyping, debugging, and iterative model experimentation.
- Batch jobs: Batch jobs are suited for running large-scale, non-interactive workloads such as model training and data preprocessing across multiple nodes. You can specify CPU, GPU, and memory requirements; select node groups; and monitor performance through real-time metrics like GPU utilization, memory consumption, and GPU temperature. Each job provides detailed status and host-level visibility for every replica.
- Inference endpoints: You can deploy and manage a wide range of models, including base models, fine-tuned models, and custom-built models. Inference endpoints can support NVIDIA NIM or bring your own container, offering flexible deployment options. The system automatically scales model replicas based on demand to ensure high availability and performance. Built-in health monitoring and resilience features reduce downtime and ensure reliable operation.

Monitoring and observability
- Health monitoring: Continuously monitor GPU and system health in real time with advanced diagnostics, including GPUd, NCCL benchmarks, and proactive alerts to identify issues. All nodes undergo rigorous validation, such as NCCL testing and GPU burn-in, ensuring they meet performance and reliability standards. The platform automatically isolates unhealthy nodes from the scheduler to prevent disruption, while real-time telemetry and customizable auto-recovery workflows maintain operational stability and workload resilience.

- Custom workspace settings: Easily configure quotas, access controls, secrets management, billing settings, and container registries to meet enterprise requirements.
- Observability tools: Stream logs in real time, manage job lifecycles, and securely inspect API activity on a per-user basis to maintain visibility and operational control across the platform.

Getting started with DGX Cloud Lepton
You get a consistent experience across web user interfaces, command-line interfaces, and SDKs—whether you’re prototyping or deploying in production. Once onboarded, each customer receives a workspace, a secure environment to manage GPU resources and run workloads
Admins configure settings such as user access controls, secrets, container registries, and usage quotas. GPU resources are placed into node groups, which serve as the foundation for compute workloads.
You can then:
- Launch dev pods for interactive development
- Submit batch jobs for model training or data processing
- Deploy inference endpoints for real-time or batch model serving
DGX Cloud Lepton streamlines the deployment of containerized AI and machine-learning workloads. It allows you to bring your own workloads as container images, with support for any OCI-compliant container registry, including the NVIDIA NGC container registry.

Join the DGX Cloud Lepton Early Access Program
Explore the DGX Cloud Lepton in Early Access (EA) and experience firsthand how it can improve your generative AI development process. If selected, the DGX Cloud Lepton product team will engage with you to understand your use cases and compute requirements. We’re excited to see the innovative applications you’ll build with these new capabilities!
To learn more, refer to the documentation.