NVIDIA Triton Management Service (TMS) is a software application that simplifies AI inference by automating the deployment and management of multiple Triton Inference Server instances on Kubernetes with support for basic model orchestration.

TMS manages and automates deployment of Triton Inference Server instances on kubernetes, model loading and unloading on GPUs/CPUs, and efficient collocation of models

This initial release of TMS is the pre-release version under early-access program. It is considered alpha quality software and not recommended for production deployment.

Supported functionality for alpha release:

  • Automates deploying and managing Triton on Kubernetes (k8s) with requested models
  • Enables more efficient GPU utilization by allowing multiple models to share the same Triton instance in a single pod.
  • Unloads models when not in use
  • Groups models from different frameworks together to ensure they coexist efficiently without out of memory issues
  • Allows for loading models from multiple sources such as secure registry, HTTPS, etc

The early access version of NVIDIA Triton Management Service is expected to be available in June 2022.