GTC-DC 2019: Using the TensorRT Inference Server to Cut GPU costs and Simplify Model Deployment (Presented by CACI)

Video Player is loading.
Current Time 0:00
Duration -:-
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
      Kyle Pula, CACI
      gtc-dc 2019
      We’ll share knowledge derived from our experience scaling AI from research to production using the TensorRT Inference Server (TRTIS) in order to help engineers and managers dissatisfied with the complexity or cost efficiency of their model deployment architectures. We’ll show how TRTIS deploys a library of models across a collection of GPUs, reducing the amount of custom inference code requiring maintenance, and making better use of existing GPU resources. It also provides options for batching, ensemble models, and custom backends. Research and production teams will benefit from the ability to offload some of the inference complexity to TRTIS.