At NIPS 2017, NVIDIA announced new software releases for deep learning and HPC developers. The latest SDK updates include new capabilities and performance optimizations to TensorRT, CUDA toolkit and the new project CUTLASS library.
Matrix multiplication is a key computation within many scientific applications, particularly those in deep learning. Many operations in modern deep neural networks are either defined as matrix multiplications or can be cast as such.
Once you have built, trained, tweaked and tuned your deep learning model, you need an inference solution that you need to deploy to a datacenter or to the cloud, and you need to get the maximum possible performance.