NVIDIA TensorRT Inference Server Available Now

The NVIDIA TensorRT inference server GA version is now available for download in a container from the NVIDIA GPU Cloud container registry.
Announced at GTC Japan and part of the NVIDIA TensorRT Hyperscale Inference Platform, the TensorRT inference server is a containerized microservice for data center production deployments.
As more and more applications leverage AI, it has become vital to provide inference capabilities in production environments. Just as an application might call to a web server to leverage HTML content, modern applications need to access inference in this same way, via a simple API call. But existing solutions are often custom developed for a specific application and not for general purpose production, and aren’t optimized to get the most out of GPUs, limiting their usefulness.
The TensorRT inference server provides production quality inference capabilities in a ready-to-run container. It maximizes utilization by supporting multiple models per GPU so every GPU can service any incoming request, eliminating bottlenecks with previous solutions that could only support a single model per GPU. It supports all popular AI frameworks, so data scientists can develop their models in the best frameworks for the job. And the TensorRT inference server seamlessly integrates into DevOps deployments leveraging Docker and Kubernetes.
With the NVIDIA TensorRT inference server, there’s now a common solution for AI inference, allowing researchers to focus on creating high-quality trained models, DevOps engineers to focus on deployment, and developers to focus on their applications without needing to reinvent the AI plumbing over and over again.
Download the TensorRT inference server from the NVIDIA GPU Cloud container registry now.
Learn how to use the TensorRT inference server in this NVIDIA Developer Blog post.

NVIDIA TensorRT Inference Server Available Now

Tags

About the Authors

NVIDIA TensorRT Inference Server Available Now

Tags

About the Authors

Comments

Related posts

Production Deep Learning Inference with TensorRT Inference Server

NVIDIA TensorRT Inference Server Now Open Source

NVIDIA AI Inference Performance Milestones: Delivering Leading Throughput, Latency and Efficiency

NVIDIA TensorRT Inference Server and Kubeflow Make Deploying Data Center Inference Simple

RESTful Inference with the TensorRT Container and NVIDIA GPU Cloud

Related posts

Enhancing Distributed Inference Performance with the NVIDIA Inference Transfer Library

Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

5 New Digital Twin Products Developers Can Use to Build 6G Networks

Accelerating Long-Context Model Training in JAX and XLA

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel