Developer Blog


Achieve up to 75% Performance Improvement for Communication Intensive HPC Applications with NVTAGS

Many GPU-accelerated HPC applications spend a substantial portion of their time in non-uniform, GPU-to-GPU communications. Additionally, in many HPC systems, different GPU pairs share communication links with varying bandwidth and latency. As a result, GPU assignment can substantially impact time to solution. Furthermore, on multi-node / multi-socket systems, communication performance can degrade when GPUs communicate with CPUs and NICs outside their system affinity. Because resource selection is system dependent, it is challenging to select resources such that communication costs are minimized.

NVIDIA Topology-Aware GPU Selection (NVTAGS) abstracts away the complexity of efficient resource selection. NVTAGS automates intelligent GPU assignment by profiling HPC applications and launching them with a custom GPU assignment tailored to an application and system to minimize communication costs. NVTAGS ensures that, regardless of a system’s communication topology, MPI processes communicate with the CPUs and NICs or HCAs within their own affinity. 

NVTAGS improves performance of Chroma, MILC, and LAMMPS from 2% to 75% on one to 16 nodes.

Key NVTAGS Features:

  • Automated topology detection along with CPU and NIC/HCA binding, independent of the system and HPC application
  • Support for single- and multi-node, PCIe, and NVIDIA NVLink with NVIDIA Pascal, Volta, and Ampere architecture GPUs
  • Automatic caching of efficient GPU selection for future simulations
  • Straightforward integration with Slurm and Singularity

Download NVTAGS 1.0.0 today. 

Additional Resources:

NVTAGS Product Page
Blog: Overcoming Communication Congestion for HPC Applications with NVIDIA NVTAGS