GTC 2020: Deep into Triton Inference Server: BERT Practical Deployment on NVIDIA GPU

After clicking “Watch Now” you will be prompted to login or join.

Click “Watch Now” to login or join the NVIDIA Developer Program.

WATCH NOW

Deep into Triton Inference Server: BERT Practical Deployment on NVIDIA GPU

Tianhao Xu, NVIDIA

GTC 2020

We'll give an overview of the TensorRT Hyperscale Inference Platform. We start with a deep dive into current features and internal architecture, then go into deployment possibilities in a generic deployment ecosystem. Next, we'll give a hands-on overview of NVIDIA Bert, FasterTransformer and TRT-optimized BERT inference. Then we'll get into how to deploy BERT TensorFlow model with custom op, how to deploy BERT TensorRT model with plugins, and benchmarking. We'll finish with other optimization techniques and open discussion.

View More GTC 2020 Content