NVIDIA Triton Management Service (TMS) is a software application that simplifies AI inference by automating the deployment and management of multiple Triton Inference Server instances on Kubernetes with support for basic model orchestration.

TMS manages and automates deployment of Triton Inference Server instances on kubernetes, model loading and unloading on GPUs/CPUs, and efficient collocation of models

This initial release of TMS is the pre-release version under early-access program. It is considered alpha quality software and not recommended for production deployment.

Supported functionality for alpha release:

  • Automates deploying and managing Triton on Kubernetes (k8s) with requested models
  • Enables more efficient GPU utilization by allowing multiple models to share the same Triton instance in a single pod.
  • Unloads models when not in use
  • Groups models from different frameworks together to ensure they coexist efficiently without out of memory issues
  • Allows for loading models from multiple sources such as secure registry, HTTPS, etc

Please note that you must be a registered NVIDIA developer in order to join the program and that you must be logged in using your organization's email address. We cannot accept applications from accounts using Gmail, Yahoo, QQ or other such email addresses.

To participate, please fill out the short application and provide details about your use case.

Please click "Join Now" to answer and submit the survey questions to be considered for the Triton Management Service (TMS) early access program:

Join now

The cost is free.The early access version of NVIDIA Triton Management Service is expected to be available in June 2022. You will be notified via email if accepted into the program.