NVIDIA Grove
NVIDIA Grove is an open-source Kubernetes API that defines the structure and lifecycle of single- and multi-node AI inference workloads, such as those deployed with NVIDIA Dynamo, while enabling them to scale efficiently in Kubernetes-based environments.
Purpose-built for orchestrating large-scale AI workloads with complex requirements in GPU clusters, Grove lets developers describe multi-component workloads—including specific roles, dependencies, multi-level-scaling rules, and startup order—within a single custom resource. Grove is a modular component of NVIDIA Dynamo, but it can also be deployed as a standalone solution or integrated into other high-performance inference frameworks.
How NVIDIA Grove Works
High-performance inference frameworks use Grove's hierarchical APIs to express role-specific logic and multi-level scaling, enabling consistent, optimized deployment across diverse cluster environments. Grove achieves this by orchestrating multi-component AI workloads using three hierarchical custom resources in its workload API.
PodCliques represent groups of Kubernetes pods with specific roles, such as prefill worker, decode leader, or frontend service, each with independent configuration and scaling logic.
PodCliqueScalingGroups bundle tightly coupled PodCliques that must scale together, like the prefill leader and prefill workers that need coordinated scaling behavior.
PodCliqueSets define the entire multi-component workload, specifying startup order, scaling policies, and gang-scheduling constraints that ensure all components start together or fail together. When scaling for additional capacity, Grove creates complete replicas of the entire PodGangSet and defines spread constraints that distribute these replicas across the cluster for high availability, while keeping each replica's components network-packed for optimal performance.
A Grove-enabled Kubernetes cluster requires the Grove operator and a scheduler that understands PodGang resources, such as KAI scheduler.
When a PodCliqueSet resource is created, Grove's operator validates the specification and automatically generates the necessary Kubernetes resources, including the constituent PodCliques, PodCliqueScalingGroups, and associated services, secrets, and autoscaling policies. The Grove operator then creates PodGang resources that translate workload requirements into scheduling constraints for the scheduler. Each PodGang contains PodGroups with minimum replica guarantees, network topology packing requirements for performance, and spread constraints for availability, achieving topology-aware placement and efficient resource utilization across the cluster.
The scheduler watches for these PodGang resources and applies gang-scheduling logic, ensuring all required components are scheduled together or not at all, while optimizing placement based on GPU cluster topology. This process results in coordinated deployment of multi-component AI stacks where prefill services, decode workers, and routing components start in the correct order with optimal network placement, preventing resource deadlocks and partial deployments that waste resources in the cluster.
Quick-Start Guide
Deploy your first AI inference workload using PodGangSets, PodCliques, and ScalingGroups, going from installation to running disaggregated inference on Kubernetes in minutes.
Why Grove: The Orchestration and Scaling Problem
Read how Grove transforms complex AI inference workloads from dozens of YAML files and manual coordination into single, declarative custom resource definitions (CRDs) with built-in intelligence.
Discover More About Grove
Read the complete API reference, advanced configuration options, and detailed guides for deploying Grove in production environments.
Get Started With NVIDIA Grove
Install Grove on Kubernetes and run your first multi-component AI workload.
Get Grove Running on Your Cluster
Grove installation deploys the Grove operator, creating the necessary CRDs for PodCliqueSets, PodCliques, and PodCliqueScalingGroups, along with controllers for managing workloads and scheduling constraint generation.
Take a Deep Dive Into NVIDIA Grove
Learn what Grove solves, its key capabilities, and how it enables declarative workload definition with easy-to-use, high-level APIs for scheduler-level optimizations.
Watch VideoNVIDIA Grove Starter Kits
Disaggregated Serving
Disaggregated inference separates model serving into specialized components (prefill, decode, routing) based on their different needs. This kit explores the architectural patterns and orchestration challenges of disaggregated serving.
Basics of LLM Inference (Tech Blog)
Scheduling AI Workloads
Scheduling workloads plays a critical role throughout the entire AI lifecycle, from initial model training to inference. This kit covers advanced scheduling concepts essential for high-performance AI workloads in the context of KAI scheduler.
Why Scheduling Matters for Disaggregated Serving (Tech Blog)
Gang Scheduling and Workload Prioritization (Tech Blog)
Inference Optimization
Maximizing AI inference performance requires understanding and applying advanced optimization techniques across hardware and software. This kit covers different approaches to achieve optimal throughput and latency in production environments.
Speculative Decoding for Higher Throughput (Tech Blog)
Choosing Between Pipeline and Tensor Parallelism (Tech Blog)
MultiShot Communication Protocol (Tech Blog)
NVIDIA Grove Learning Library
More Resources
Ethical AI
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.
Get started with NVIDIA Grove today.