Once you have built, trained, tweaked and tuned your deep learning model, you need an inference solution that you need to deploy to a datacenter or to the cloud, and you need to get the maximum possible performance. You may have heard that NVIDIA TensorRT can maximize inference performance on NVIDIA GPUs, but how do you get from your trained model to a TensorRT-based inference engine in your datacenter or in the cloud? The new TensorRT container can help you solve this problem.
Based on NVIDIA Docker, the TensorRT container encapsulates all the libraries, executables and drivers you need to develop a TensorRT-based inference application. In just a few minutes you can go from nothing to having a local development environment for your inference solution that can also act as the basis for your own container-based datacenter or cloud deployment.
A new NVIDIA Developer Blog post introduces the TensorRT container and describes the simple REST server included in the container, which can act as a basis or inspiration for your own deployment solution.
RESTful Inference with the TensorRT Container and NVIDIA GPU Cloud
Dec 05, 2017
Discuss (0)
Related resources
- GTC session: Optimizing Inference Performance and Incorporating New LLM Features in Desktops and Workstations
- GTC session: Speeding up LLM Inference With TensorRT-LLM
- GTC session: Optimizing and Scaling LLMs With TensorRT-LLM for Text Generation
- NGC Containers: NVIDIA MLPerf Inference
- NGC Containers: NVIDIA MLPerf Inference
- SDK: TensorRT