GTC Silicon Valley-2019 ID:S9956:Best Practices When Benchmarking CUDA Applications

Bill Fiser(NVIDIA),Sebastian Jodlowski(NVIDIA)
We'll explain how to configure a system for benchmarking CUDA applications, point out common mistakes that can occur, and describe how to avoid these errors. Measuring performance in a deterministic and reproducible way is difficult. It is particularly challenging on GPU-Accelerated heterogeneous systems in which complex interactions among CPUs, GPUs, the memory subsystem, the OS, and many other factors need to be properly addressed. We will cover topics such as power management, system topology, NUMA-awareness, thread affinity, OS thread scheduling, and CUDA JIT caches.

