GTC 2020: Accelerating GNMT Inference on GPU

After clicking “Watch Now” you will be prompted to login or join.

Click “Watch Now” to login or join the NVIDIA Developer Program.

WATCH NOW

Accelerating GNMT Inference on GPU

Maxim Milakov, NVIDIA | Jeremy Appleyard, NVIDIA

GTC 2020

Google Neural Machine Translation (GNMT) is one of the benchmarks in the MLPerf inference benchmark suite, representing Seq2Seq models. The benchmark measures throughput under latency constraints. We'll go through the challenges that we, at NVIDIA, faced when implementing the GNMT benchmark and how we solved them with NVIDIA GPUs using the optimized and customizable TensorRT library. You'll learn the tricks we used to optimize the GNMT model, many of which are applicable to other auto-regressive models and to DL inference on GPU in general.

View More GTC 2020 Content