NVIDIA Developer Zone

NVIDIA GPUDirect™

Whether you are exploring mountains of geological data, researching solutions to complex scientific problems, or racing to model fast-moving financial markets, you need a computing platform that delivers the highest throughput and lowest latency possible. GPU-accelerated clusters and workstations are widely recognized for providing the tremendous horsepower required to perform compute-intensive workloads, and your applications can achieve even faster results with NVIDIA GPUDirect™.

First released in June 2010, GPUDirect is supported by InfiniBand solutions available from Mellanox and QLogic, and other vendors are adding support for GPUDirect in their hardware and software products now.

Using GPUDirect, 3rd party network adapters, solid-state drives (SSDs) and other devices can directly read and write CUDA host and device memory, eliminating unnecessary system memory copies and CPU overhead, resulting in significant performance improvements in data transfer times on NVIDIA Tesla™ and Quadro™ products.

For more information, see the GPUDirect Technology Overview presentation.

Key Features:

  • Accelerated communication with network and storage devices
    Avoid unnecessary system memory copies and CPU overhead by copying data directly to/from pinned CUDA host memory
  • Peer-to-Peer Transfers between GPUs
    Use high-speed DMA transfers to copy data from one GPU directly to another GPU in the same system
  • Peer-to-Peer memory access
    Optimize communication between GPUs using NUMA-style access to memory on other GPUs from within CUDA kernels
  • RDMA
    Eliminate CPU bandwidth and latency bottlenecks using direct memory access (DMA) between GPUs and other PCIe devices, resulting in significantly improved MPISendRecv efficiency between GPUs and other nodes
  • GPUDirect for Video
    Optimized pipeline for frame-based devices such as frame grabbers, video switchers, HD-SDI capture, and CameraLink devices. More Info.

The diagrams below show how GPUDirect technologies work


GPUDirect™ support for RDMA introduced with CUDA 5

NVIDIA GPUDirect™ Accelerated Communication with Network and Storage Devices

 

NVIDIA GPUDirect Peer-to-Peer (P2P) Communication Between GPUs on the Same PCIe Bus.

How Do I Get GPUDirect?

GPUDirect  peer-to-peer transfers and memory access are supported natively by the CUDA Driver.  All you need is CUDA Toolkit v4.0 with R270 drivers (or later) and a system with two or more Fermi-architecture GPUs on the same PCIe bus.  For more information on using GPUDirect P2P communication in your applications, please see:

GPUDirect accelerated communication with network and storage devices is supported on Tesla M-class and Tesla S-class datacenter products running Red Hat Enterprise Linux (RHEL). Support for additional GPUs and Linux distros will be added in future releases. 

You may also need to contact your InfiniBand vendor and/or install updated drivers for adaptors using GPUDirect v1.0. Please use the links below or contact your InfiniBand vendor directly:

  • For the OFED Driver Image for Mellanox ConnectX-2 InfiniBand adapters, contact hpc@mellanox.com.
  • Support for QLogic TrueScale HCAs is available using QLogic OFED+ 6.1 or QLogic InfiniBand Fabric Suite 6.1. Please contact QLogic technical support if you have any question or want more details.

Frequently Asked Questions

Q: My company makes network adaptors / storage devices. How do we enable our products for GPUDirect?