Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9218:Filling the Performance Gap in Convolution Implementations for NVIDIA GPUs

Antonio J.Peña(Barcelona Supercomputing Center (BSC)),Pedro ValeroLara(Barcelona Supercomputing Center (BSC))
We'll discuss an implementation of GPU convolution that favors coalesced accesses without requiring prior data transformations. Convolutions are the core operation of deep learning applications based on convolutional neural networks. Current GPU architectures are typically used for training deep CNNs, but some state-of-the-art implementations are inefficient for some commonly used network configurations. We'll discuss experiments that used our new implementation, which yielded notable performance improvements including up to 2.29X speedups in a wide range of common CNN configurations.

View the slides (pdf)