GTC Silicon Valley-2019 ID:S9389:Structural Sparsity: Speeding Up Training and Inference of Neural Networks by Linear Algorithms
Alexander Keller(NVIDIA),Matthijs VanKeirsbilck(NVIDIA),Xiaodong Yang(NVIDIA)
Learn how to achieve real-world speedup of neural networks using structural sparsity. Structural sparsity reduces the number of weights and computations in a way that's suitable for hardware acceleration. Over-parameterized neural networks waste memory and energy. Techniques like pruning or factorization can alleviate this during inference but they often increase training time, and achieving real-world speedups remains difficult. We'll explain how biology-inspired techniques can reduce the number of weights from quadratic to linear in the number of neurons. Compared to fully connected neural networks, these structurally sparse neural networks achieve large speedups during both training and inference, while maintaining or even improving model accuracy. We'll discuss hardware considerations and results for feed-forward and recurrent networks.