NVIDIA Triton Inference Server
NVIDIA Triton™ Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise, is open-source software that standardizes AI model deployment and execution across every workload.
Ways to Get Started With NVIDIA Triton Inference Server
Find the right license to deploy, run, and scale AI inference for any application on any platform.
Purchase NVIDIA AI Enterprise
Purchase NVIDIA AI Enterprise, which includes Triton Inference Server and Triton Management Service for production inference.
AI Enterprise Evaluation License Apply to Try Triton Inference Server
on NVIDIA LaunchPad Contact Us to Learn More About
Access Code for Development
Triton Inference Server is available as open source software on GitHub with end-to-end examples.
Download Containers and Releases
Linux-based Triton Inference Server containers for x86 and Arm® are available on NVIDIA NGC™.
Client libraries as well as binary releases of Triton Inference Server for Windows and NVIDIA Jetson JetPack are available on GitHub.
Learn the basics for getting started with Triton Inference Server, including how to create a model repository, launch Triton, and send an inference request.
Read about how Triton Inference Server helps simplify AI inference in production, the tools that help with Triton deployments, and ecosystem integrations.
Access technical content on inference topics such as large language models, cloud deployments, and model ensembles.
Large Language Models
Large language models (LLMs) are an increasingly important class of deep learning models, and they require unique features to maximize their acceleration. This kit will take you through features of Triton Inference Server built around LLMs and how to utilize them.
- Deploying a 1.3B GPT-3 Model With NVIDIA NeMo™ Framework
- Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server
How to Deploy an AI Model in Python With PyTriton
Triton Inference Server includes many features and tools to help deploy deep learning at scale and in the cloud. With this kit, you can explore how to deploy Triton inference Server in different cloud and orchestration environments.
- Run Multiple AI Models on the Same GPU With Amazon SageMaker Multi-Model Endpoints Powered by NVIDIA Triton Inference Server
- Boosting AI Model Inference Performance on Azure Machine Learning
- Deploying NVIDIA Triton at Scale With MIG and Kubernetes
One-Click Deployment of NVIDIA Triton Inference Server to Simplify AI Inference on Google Kubernetes Engine (GKE)
Modern deep learning systems often require the use of multiple models in a pipeline and the use of accelerated pre- and post processing steps. Learn how to implement these efficiently in Triton Inference Server with model ensembles and business logic scripting.
- Serving Machine Learning Model Pipelines on NVIDIA Triton Inference Server With Ensemble Models
- Accelerating Inference With NVIDIA Triton Inference Server and NVIDIA DALI
How to Deploy HuggingFace’s Stable Diffusion Pipeline With Triton Inference Server
Learn anytime, anywhere, with just a computer and an internet connection through our Deploying a Model for Inference at Production Scale self-paced course.
Stay up to date on the latest inference news from NVIDIA.
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.