Dynamo-Triton
Dynamo-Triton, previously known as NVIDIA Triton Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise, is open-source software that standardizes AI model deployment and execution across every workload.
Ways to Get Started With Dynamo-Triton
Find the right license to deploy, run, and scale AI inference for any application on any platform.
Purchase NVIDIA AI Enterprise
Purchase NVIDIA AI Enterprise, which includes Dynamo-Triton for production inference.
AI Enterprise Evaluation License Apply to Try Dynamo-Triton
on NVIDIA LaunchPad Contact Us to Learn More About
Purchasing Triton
Access Code for Development
Dynamo-Triton is available as open source software on GitHub with end-to-end examples.
Repository
Download Containers and Releases
Linux-based Dynamo-Triton containers for x86 and Arm® are available on
NVIDIA NGC™.
Client libraries as well as binary releases of Dynamo-Triton for Windows and NVIDIA Jetson JetPack are available on GitHub.
Introductory Resources
Quick-Start Guide
Learn the basics for getting started with Dynamo-Triton, including how to create a model repository, launch Triton, and send an inference request.
Introductory Blog
Read about how Dynamo-Triton helps simplify AI inference in production, the tools that help with Triton deployments, and ecosystem integrations.
Read Blog
Tutorials
Take a deeper dive into some of the concepts in Dynamo-Triton, along with examples of deploying a variety of common models.
Get Started
Content Kits
Access technical content on inference topics such as large language models, cloud deployments, and model ensembles.
Large Language Models
Large language models (LLMs) are an increasingly important class of deep learning models, and they require unique features to maximize their acceleration. This kit will take you through features of Dynamo-Triton built around LLMs and how to utilize them.
- Deploying a 1.3B GPT-3 Model With NVIDIA NeMo™ Framework
- Accelerated Inference for Large Transformer Models Using Dynamo-Triton
- Deploying GPT-J and T5 With Dynamo-Triton
- How to Deploy an AI Model in Python With PyTriton
- Deploying, Optimizing, and Benchmarking Large Language Models
Cloud Deployments
Dynamo-Triton includes many features and tools to help deploy deep learning at scale and in the cloud. With this kit, you can explore how to deploy Dynamo-Triton in different cloud and orchestration environments.
- Run Multiple AI Models With Amazon SageMaker
- Boosting AI Model Inference Performance on Azure Machine Learning
- Deploying NVIDIA Triton at Scale With MIG and Kubernetes
- One-Click Deployment of Dynamo-Triton GKE
- Harness the Power of Cloud-Ready AI Inference Solutions
- Generate Stunning Images with Stable Diffusion XL
Model Ensembles
Modern deep learning systems often require the use of multiple models in a pipeline and the use of accelerated pre- and post processing steps. Learn how to implement these efficiently in Dynamo-Triton with model ensembles and business logic scripting.
- Serving Machine Learning Model Pipelines on Dynamo-Triton With Ensemble Models
- Accelerating Inference With Dynamo-Triton and NVIDIA DALI
- How to Deploy HuggingFace’s Stable Diffusion Pipeline With Dynamo-Triton
Self-Paced Training
Learn anytime, anywhere, with just a computer and an internet connection through our Deploying a Model for Inference at Production Scale self-paced course.