The NVIDIA Triton Inference Server, previously known as TensorRT Inference Server, is now available from NVIDIA NGC or via GitHub.
The NVIDIA Triton Inference Server helps developers and IT/DevOps easily deploy a high-performance inference server in the cloud, in on-premises data center or at the edge. The server provides an inference service via an HTTP/REST or GRPC endpoint, allowing clients to request inferencing for any model being managed by the server.
Developers and AI companies use NVIDIA Triton Inference Server to deploy models from different framework backends such as TensorFlow, TensorRT, PyTorch and ONNX Runtime.
One organization using the NVIDIA Triton Inference Server is Tracxpoint, a leading global provider of next-generation self-checkout grocery solutions.
The company is working to make in-store retail experiences as streamlined as online retail. To do this, they use deep learning to perform object detection on shopping items placed in a cart, present personalized real-time offers from vendors to customers, and provide navigation through shopping aisles.
Tracxpoint uses the NVIDIA Triton Inference Server to deploy and serve multiple models from different frameworks such as TensorFlow and TensorRT. The NVIDIA Triton Inference Server gives them the flexibility to update retrained models seamlessly without any application restarts or disruption to the user.
NVIDIA Triton Inference Server is also part of the open inference platforms Kubeflow and KFServing. Triton Inference Server will be one of the first to adopt the new KFServing V2 API.
For more information about the NVIDIA Triton Inference server, visit the NVIDIA inference web page, GitHub, and NGC.
Also, watch this GTC Digital live webinar, Deep into Triton Inference Server: BERT Practical Deployment on NVIDIA GPU, to learn more.