After clicking “Watch Now” you will be prompted to login or join.


Click “Watch Now” to login or join the NVIDIA Developer Program.


Faster Transformer

Bo Yang Hsueh, NVIDIA

GTC 2020

Recently, models such as BERT and XLNet, which adopt a stack of transformer layers as key components, show breakthrough performance in various deep learning tasks. Consequently, the inference performance of the transformer layer greatly limits the possibility that such models can be adopted in online services. First, we'll show how Faster Transformer optimizes the inference computation of both the transformer encoder and decoder layers. In addition to optimizations on the standard transformer, we'll get into how to customize Faster Transformer to accelerate a pruned transformer encoder layer together with the CUTLASS library.

View More GTC 2020 Content