GTC 2020: Advanced Optimizations of Persistent Recurrent Neural Networks
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Advanced Optimizations of Persistent Recurrent Neural Networks
Vasily Volkov, NVIDIA | Jeremy Appleyard, NVIDIA
Recurrent Neural Networks (RNNs) with small batch sizes tend to be bandwidth-bound when implemented naively. Persisting the majority of the inputs in low-level GPU memory can turn the problem back into a compute-bound one and see order-of-magnitude speedups. We'll dive into our methods to achieve performance in cuDNN's persistent RNN implementation, many of which are applicable to other persistent methods.