GTC Silicon Valley-2019: Optimize Deep FSMN Network

Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9113:Optimize Deep FSMN Network

Yongchao Liu(Ant Financial),Jun Yang(Alibaba)
Learn how to speed up a deep feedforward sequential memory network (FSMN) on Volta. We'll describe how to use Tensor Cores to speed up GEMM operations and explain how to optimize an FSMN kernel by increasing its locatiy and reducing its math workload. Although RNNs are a powerful tool to process sequence-to-sequence problems, their recurrent structure increases computational complexity. As an alternative, FSMN can effectively model long-term dependency without using any recurrent structure. We'll show how GPU-friendly FSMN can outperform RNN in both accuracy and speed. Our work is based on Alibaba's deep FSMN model.