Note: Viewing this video may require joining the NVIDIA Developer Program or login in

GTC-DC 2019: Conversational AI Inference Deployment using TensorRT Inference Server

Michael Demoret, NVIDIA
We’ll demonstrate a complete solution of a popular speech pipeline to implement ASR, NLP, and text to speech as an online streaming speech service using the NVIDIA TensorRT Inference Server. The TensorRT Inference Server maximizes GPU utilization and simplifies deploying AI models at scale. It supports all popular AI frameworks, multi-GPU models, enhanced metrics, dynamic batching use cases, and integrates seamlessly into DevOps deployments using Docker, Kubernetes, Prometheus, and Kubeflow. The server can help deploy automatic speech recognition (ASR), natural language processing (NLP), recommendation systems, and object detection. In this conversational AI demonstration, the inference server combines common deep learning frameworks with custom backends and stitches together the pipeline’s additional components using model ensembling. We’ll demonstrate that TRTIS can simplify deep learning deployments in the cloud and data centers.

View more talks and sessions from this conference