Whether you are exploring mountains of geological data, researching solutions to complex scientific problems, or racing to model fast-moving financial markets, you need a computing platform that delivers the highest throughput and lowest latency possible. GPU-accelerated clusters and workstations are widely recognized for providing the tremendous horsepower required by compute-intensive workloads. Compute-intensive applications can provide even faster results with  NVIDIA GPUDirect™.

First introduced in June 2010, GPUDirect version 1 supported accelerated communication with network and storage devices. Itwas supported by InfiniBand solutions available from Mellanox and others. In 2011, GPUDirect version 2 added support for peer-to-peer communication between GPUs on the same shared memory server. GPU Direct RDMA, announced in 2013, enables RDMA transfers across an Infiniband network between GPUs in different cluster nodes, bypassing CPU host memory altogether. 

Using GPUDirect, 3rd party network adapters, solid-state drives (SSDs) and other devices can directly read and write CUDA host and device memory. GPUDirect eliminates unnecessary system memory copies, dramatically lowers CPU overhead, and reduces latency, resulting in significant performance improvements in data transfer times for applications running on NVIDIA Tesla™ and Quadro™ products.

For more information, see the GPUDirect Technology Overview.

Key Features:

  • Accelerated communication with network and storage devices
    Network and GPU device drivers can share “pinned” (page-locked) buffers, eliminating the need to make a redundant copy in CUDA host memory.
  • Peer-to-Peer Transfers between GPUs
    Use high-speed DMA transfers to copy data between the memories of two GPUs on the same system/PCIe bus.
  • Peer-to-Peer memory access
    Optimize communication between GPUs using NUMA-style access to memory on other GPUs from within CUDA kernels.
  • RDMA
    Eliminate CPU bandwidth and latency bottlenecks using remote direct memory access (RDMA) transfers between GPUs and other PCIe devices, resulting in significantly improved MPISendRecv efficiency between GPUs and other nodes)
  • GPUDirect for Video
    Optimized pipeline for frame-based devices such as frame grabbers, video switchers, HD-SDI capture, and CameraLink devices.

The diagrams below show how GPUDirect technologies work.



GPUDirect™ Support for RDMA, Introduced with CUDA 5 (2012)



GPUDirect™v1 Support for Accelerated Communication with Network and Storage Devices(2010)



NVIDIA GPUDirect v2 Peer-to-Peer (P2P) Communication Between GPUs on the Same PCIe Bus (2011)


How Do I Get GPUDirect?

GPUDirect accelerated communication with network and storage devices is supported on Tesla datacenter products running Red Hat Enterprise Linux (RHEL). Check the documentation for possible support on other Linux distributions.

You may also need to contact your InfiniBand vendor and/or install updated drivers for adaptors using GPUDirect v1.0. Please use the links below or contact your InfiniBand vendor directly:

  • For the OFED Driver Image for Mellanox ConnectX-2 InfiniBand adapters, contact hpc@mellanox.com

GPUDirect  peer-to-peer transfers and memory access are supported natively by the CUDA Driver. All you need is CUDA Toolkit v4.0 and R270 drivers (or later) and a system with two or more Fermi- or Kepler-architecture GPUs on the same PCIe bus. For more information on using GPUDirect communication in your applications, please see:

GPUDirect support for RDMA is available now in the latest CUDA Toolkit.

Frequently Asked Questions

Q: My company makes network adaptors / storage devices. How do we enable our products for GPUDirect?
A: Please contact us for more information at gpudirect@nvidia.com

Q: Where can I get more information about GPUDirect support for RDMA?
A: API documentation for Linux driver developers interested in integrating RDMA support is available in the CUDA Toolkit and online.