NVIDIA Developer Zone

NVIDIA GPUDirect™

Whether racing to model fast-moving financial markets, exploring mountains of geological data, or researching solutions to complex scientific problems, you need a computing platform that delivers the highest throughput and lowest latency possible. GPU-accelerated clusters and workstations are widely recognized for providing the tremendous horsepower required to perform compute-intensive workloads, and your applications can achieve even faster results with NVIDIA GPUDirect™.

First released in June 2010, GPUDirect is supported by InfiniBand solutions available from Mellanox and QLogic, and other vendors are adding support for GPUDirect in their hardware and software products now.

Using GPUDirect, 3rd party network adapters, solid-state drives (SSDs) and other devices can directly read and write CUDA host memory, eliminating unnecessary system memory copies and CPU overhead, resulting in significant performance improvements in data transfer times on NVIDIA Tesla™ and Quadro™ products.

GPUDirect also includes  support for peer-to-peer (P2P) DMA transfers directly between GPUs and NUMA-style direct access to GPU memory from other GPUs.  These capabilities lay the foundation for direct P2P communication between GPUs and other devices in a future release.

For more information, see the GPUDirect Technology Overview presentation.

Key Features:

  • Accelerated communication with network and storage devices
    Avoid unnecessary system memory copies and CPU overhead by copying data directly to/from pinned CUDA host memory
  • Peer-to-Peer Transfers between GPUs
    Use high-speed DMA transfers to copy data from one GPU directly to another GPU in the same system
  • Peer-to-Peer memory access
    Optimize communication between GPUs using NUMA-style access to memory on other GPUs from within CUDA kernels
  • GPUDirect for Video
    Optimized pipeline for frame-based devices such as frame grabbers, video switchers, HD-SDI capture, and CameraLink devices. More Info.

MPI applications automatically benefit from GPUDirect accelerated network communication. 

The diagrams below show how GPUDirect technologies work

NVIDIA GPUDirect™ Accelerated Communication with Network and Storage Devices

 

NVIDIA GPUDirect Peer-to-Peer (P2P) Communication Between GPUs on the Same PCIe Bus.

How Do I Get GPUDirect?

GPUDirect  peer-to-peer transfers and memory access are supported natively by the CUDA Driver.  All you need is CUDA Toolkit v4.0 with R270 drivers (or later) and a system with two or more Fermi-architecture GPUs on the same PCIe bus.  For more information on using GPUDirect P2P communication in your applications, please see:

GPUDirect accelerated communication with network and storage devices is supported on Tesla M-class and Tesla S-class datacenter products running Red Hat Enterprise Linux (RHEL). Support for additional GPUs and Linux distros will be added in future releases. 

To enable this feature on your systems, please download the file below and follow the installation instructions in the README file.

Download: nvidia-gpudirect-3.2-1.tar.gz

You may also need to contact your InfiniBand vendor and/or install updated drivers for adaptors using GPUDirect v1.0. Please use the links below or contact your InfiniBand vendor directly:

  • For the OFED Driver Image for Mellanox ConnectX-2 InfiniBand adapters, contact hpc@mellanox.com.
  • Support for  QLogic TrueScale HCAs is available using QLogic OFED+ 6.1 or QLogic InfiniBand Fabric Suite 6.1. Please contact QLogic technical support if you have any question or want more details.

Frequently Asked Questions

Q: My company makes network adaptors / storage devices. How do we enable our products for GPUDirect?