GTC Silicon Valley-2019: Maximizing Utilization for Data Center Inference with TensorRT Inference Server

Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9438:Maximizing Utilization for Data Center Inference with TensorRT Inference Server

David Goodwin(NVIDIA),Soyoung Jeong(NVIDIA)
As the use of AI has increased, so has the need for a production-quality AI inference solution. We'll discuss the latest additions to NVIDIA's TensorRT Inference Server and describe deployment examples to help plan your data center production inference architecture. NVIDIA TensorRT Inference Server makes it possible to efficiently leverage inference in applications and to do so without reinventing the wheel. We'll talk about how TensorRT supports the top AI frameworks and custom backends, and maximizes utilization by hosting multiple models per GPU and across GPUs with dynamic request batching. Our talk will also cover how the inference server seamlessly supports Kubernetes with health and latency metrics and integrates with Kubeflow for simplified deployment.

View the slides (pdf)