The NVIDIA TensorRT inference server GA version is now available for download in a container from the NVIDIA GPU Cloud container registry.
Announced at GTC Japan and part of the NVIDIA TensorRT Hyperscale Inference Platform, the TensorRT inference server is a containerized microservice for data center production deployments.
As more and more applications leverage AI, it has become vital to provide inference capabilities in production environments. Just as an application might call to a web server to leverage HTML content, modern applications need to access inference in this same way, via a simple API call. But existing solutions are often custom developed for a specific application and not for general purpose production, and aren’t optimized to get the most out of GPUs, limiting their usefulness.
The TensorRT inference server provides production quality inference capabilities in a ready-to-run container. It maximizes utilization by supporting multiple models per GPU so every GPU can service any incoming request, eliminating bottlenecks with previous solutions that could only support a single model per GPU. It supports all popular AI frameworks, so data scientists can develop their models in the best frameworks for the job. And the TensorRT inference server seamlessly integrates into DevOps deployments leveraging Docker and Kubernetes.
With the NVIDIA TensorRT inference server, there’s now a common solution for AI inference, allowing researchers to focus on creating high-quality trained models, DevOps engineers to focus on deployment, and developers to focus on their applications without needing to reinvent the AI plumbing over and over again.
Download the TensorRT inference server from the NVIDIA GPU Cloud container registry now.
Learn how to use the TensorRT inference server in this NVIDIA Developer Blog post.
NVIDIA TensorRT Inference Server Available Now
Sep 25, 2018
Discuss (0)

AI-Generated Summary
- The NVIDIA TensorRT inference server is now available for download as a containerized microservice from the NVIDIA GPU Cloud container registry, announced at GTC Japan.
- This server provides production-quality inference capabilities, supporting multiple models per GPU and all popular AI frameworks, to maximize GPU utilization and simplify AI deployment.
- The TensorRT inference server integrates with DevOps deployments using Docker and Kubernetes, allowing developers, researchers, and DevOps engineers to focus on their specific tasks without worrying about AI infrastructure.
AI-generated content may summarize information incompletely. Verify important information. Learn more