MODEL ORCHESTRATION WITH NVIDIA TRITON’S MANAGEMENT SERVICE - EARLY ACCESS
NVIDIA Triton brings a new functionality called Management Service, a software application service that automates the deployment of multiple Triton Inference Server instances in Kubernetes with GPU resource-efficient model orchestration.
The service manages model allocation to Triton Inference Server instances, model assignment to individual GPUs/CPUs, and efficient collocation of models from different frameworks.
This release of the management service is the pre-release version under early-access program. It is considered alpha quality software and not recommended for production deployment. For instance, security features such as TLS are not supported at the moment.
New releases happen every month. Currently supported functionalities for alpha release include:
- Automates deploying and managing Triton on Kubernetes (k8s) with requested models
- Avoids unnecessary Triton Inference Server instances by loading models onto already running Triton instances when possible.
- Enables more efficient GPU utilization by allowing multiple models to share the same Triton instance in a single pod.
- Unloads models when not in use
- Groups models from different frameworks together to ensure they coexist efficiently without out of memory issues
- Allows for loading models from multiple sources such as secure registry, HTTPS, etc
- Allows custom resource allocation per model or a set of models
- REST and JSON gPRC service
Please note that you must be a registered NVIDIA developer in order to join the program and that you must be logged in using your organization's email address. We cannot accept applications from accounts using Gmail, Yahoo, QQ or other such email addresses.
To participate, please fill out the short application and provide details about your use case.
Please click "Join Now" to answer and submit the survey questions to be considered for the Triton Management Service (TMS) early access program:
The cost is free. You will be notified via email if accepted into the early access program.