After clicking “Watch Now” you will be prompted to login or join.
Advanced Optimizations of Persistent Recurrent Neural Networks
Vasily Volkov, NVIDIA | Jeremy Appleyard, NVIDIA
GTC 2020
Recurrent Neural Networks (RNNs) with small batch sizes tend to be bandwidth-bound when implemented naively. Persisting the majority of the inputs in low-level GPU memory can turn the problem back into a compute-bound one and see order-of-magnitude speedups. We'll dive into our methods to achieve performance in cuDNN's persistent RNN implementation, many of which are applicable to other persistent methods.