CUDA Pro Tip: Increase Application Performance with NVIDIA GPU Boost

NVIDIA GPU Boost™ is a feature available on NVIDIA® GeForce® products and NVIDIA® Tesla® products. It makes use of any power headroom to boost application performance. In the case of Tesla, the NVIDIA GPU Boost feature is customized for compute intensive workloads running on clusters. This application note is useful for anyone who wants to take advantage of the power headroom on the Tesla K40 in a server or within a workstation. Note that GPU Boost is a system setting, which means that this Pro Tip applies to any user of a CUDA-accelerated application, not just developers.

The Tesla K40 board targets a specific power budget (235W) when running a highly optimized compute workload, but HPC workloads vary in power consumption and profile, as the graph in Figure 1 shows. This shows that for many applications there is power headroom. NVIDIA GPU Boost for Tesla allows customers to use available power headroom to select higher graphics clocks using NVML or nvidia-smi.

Figure 1: Average GPU Power Consumption for Real Applications on Tesla K20X.

In an old post, by Saad Rahim, he benchmarks two applications with varying clocks on K40: Reverse Time Migration (RTM), a depth migration algorithm used to image complex geologies; and a Finite-difference time-domain (FDTD) electromagnetic solver. Just by using GPU Boost, he was able to measure a 18.5% performance increase in RTM versus base clocks, and over 14% improvement in FDTD, as shown in Figure 2.

Figure 2: Performance of RTM and FDTD benchmarks normalized to K40 GPU base clock rate of 745MHz. (Image courtesy of Acceleware.)

You may notice a slightly superlinear performance improvement in Figure 2 for the RTM TTI benchmark. This may seem surprising, but it has a simple explanation, as described by the Tesla Application Note on GPU Boost:

In the Tesla K40, the NVIDIA GPU Boost capability allows end users to specify the boost clock which is just the core clock. The memory clock remains at 3 GHz. However, selecting higher boost clocks does improve the effective memory bandwidth utilization for workloads that are sensitive to memory bandwidth. With higher boost clocks some workloads may even see improved PCIe transfer rates. Therefore, NVIDIA GPU Boost on the Tesla K40 helps workloads which are sensitive to core clocks, power headroom and also helps workloads that may be more sensitive to memory bandwidth than core clocks.

You can find more information on GPU Boost, including instructions, scenarios, and best practices in the Tesla Application Note on GPU Boost.