NVIDIA Triton Management Service

Automate the deployment of multiple Triton Inference Server instances in Kubernetes with resource-efficient model orchestration.

What Is NVIDIA Triton Management Service?

NVIDIA Triton™, part of the NVIDIA® AI platform, offers a new functionality called Triton Management Service  (TMS) that automates the deployment of multiple Triton Inference Server instances in Kubernetes with resource-efficient model orchestration on GPUs and CPUs. This software application manages deployment of Triton Inference Server instances with one or more AI models, allocates models to individual GPUs/CPUs, and efficiently collocates models by frameworks. Triton Management Service enables large-scale inference deployment with high performance and hardware utilization. TMS, available exclusively with NVIDIA AI Enterprise, an enterprise-grade AI software platform, enables large-scale inference deployment with high performance and hardware utilization.

Explore the Benefits of Triton Management Service

Simplified Deployment

Automates deploying and managing Triton Server Instances on Kubernetes and helps group models from different frameworks for efficient use of memory.

Resource Maximization

Loads models on demand, unloads models when not in use via a lease system, and places as many models as possible on a single GPU server.

Monitoring and Autoscaling

Monitors each Triton Inference Server’s health, capacity, and autoscale based on latency and hardware utilization.

Large-Scale Inference

Use Triton Management Service to manage inference deployment from a single model to hundreds of models efficiently. Deploy on premises or on any public cloud.

Stay up to date on the latest AI inference news from NVIDIA.