After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Roofline Performance Model for HPC and Deep-Learning Applications
Charlene Yang, NERSC, Lawrence Berkeley National Laboratory | Samuel Williams, CRD, Lawrence Berkeley National Laboratory | Yunsong Wang, NERSC, LBNL
Learn how to use the Roofline model to analyze the performance of GPU-accelerated applications. We'll cover the basics of the model, explain how to use tools such as nvprof and Nsight Systems/Compute to automate the data collection, and demonstrate how to track progress using Roofline for both HPC and deep-learning applications. We'll use examples such as GPP from material science, high-performance geometric multigrid from adaptive mesh refinement, and two kernels from TensorFlow to show how characteristics such as arithmetic intensity, memory access pattern, and thread divergence/prediction can all be captured by Roofline, offering useful insights to performance optimization.