GTC-DC 2019: Running TensorRT At Scale on Google Kubernetes Engine (Presented by Google)

Hallie Crosby, Google

gtc-dc 2019

We’ll show how running the TensorRT Inference Server on Google Kubernetes Engine (GKE) allows customers to scale deep learning workloads and access the latest NVIDIA GPUs. GKE is a managed, production-ready environment for deploying containerized applications. A wide variety of applications can be deployed to operate seamlessly with high availability and scale to meet demand, all while running securely on Google’s network. With Anthos, customers can expand these capabilities to on-premises or multicloud. We’ll show how applications can be deployed and managed on any cloud without requiring that administrators and developers learn different environments and application programming interfaces. We’ll explore applications where this capability is paramount.