GTC-DC 2019: Using the TensorRT Inference Server to Cut GPU costs and Simplify Model Deployment (Presented by CACI)
Video Player is loading.
Current Time 0:00
/
Duration -:-
Loaded: 0%
Stream Type LIVE
Remaining Time -0:00
1x
2x
1.5x
1x, selected
0.5x
Chapters
descriptions off, selected
captions settings, opens captions settings dialog
captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
Kyle Pula, CACI
gtc-dc 2019
We’ll share knowledge derived from our experience scaling AI from research to production using the TensorRT Inference Server (TRTIS) in order to help engineers and managers dissatisfied with the complexity or cost efficiency of their model deployment architectures. We’ll show how TRTIS deploys a library of models across a collection of GPUs, reducing the amount of custom inference code requiring maintenance, and making better use of existing GPU resources. It also provides options for batching, ensemble models, and custom backends. Research and production teams will benefit from the ability to offload some of the inference complexity to TRTIS.