Whether you are exploring mountains of geological data, researching solutions to complex scientific problems, or racing to model fast-moving financial markets, you need a computing platform that delivers the highest throughput and lowest latency possible. GPU-accelerated clusters and workstations are widely recognized for providing the tremendous horsepower required by compute-intensive workloads. Compute-intensive applications can provide even faster results with NVIDIA GPUDirect™.
Using GPUDirect, multiple GPUs, third party network adapters, solid-state drives (SSDs) and other devices can directly read and write CUDA host and device memory, eliminating unnecessary memory copies, dramatically lowering CPU overhead, and reducing latency, resulting in significant performance improvements in data transfer times for applications running on NVIDIA Tesla™ and Quadro™ products
GPUDirect includes a family of technologies that is continuously being evolved to increase performance and expand it's usability. First introduced in June 2010, GPUDirect Shared Access supports accelerated communication with third party PCI Express device drivers via shared pinned host memory. In 2011, the release of GPUDirect Peer to Peer added support for Transfers and direct load and store Access between GPUs on the same PCI Express root complex. Announced in 2013, GPU Direct RDMA enables third party PCI Express devices to directly access GPU bypassing CPU host memory altogether.
For more technical information, see the GPUDirect Technology Overview.
The diagrams below show how GPUDirect technologies work.
GPUDirect™ Support for RDMA, Introduced with CUDA 5 (2012)
GPUDirect&trade Support for Accelerated Communication with Network and Storage Devices(2010)
NVIDIA GPUDirect Peer-to-Peer (P2P) Communication Between GPUs on the Same PCIe Bus (2011)
GPUDirect Support for RDMA, Introduced with CUDA 5 (2012)
How Do I Get GPUDirect?
GPUDirect accelerated communication with network and storage devices is supported on Tesla datacenter products running Red Hat Enterprise Linux (RHEL). Check the documentation for possible support on other Linux distributions.
GPUDirect peer-to-peer transfers and memory access are supported natively by the CUDA Driver. All you need is CUDA Toolkit v4.0 and R270 drivers (or later) and a system with two or more Fermi- or Kepler-architecture GPUs on the same PCIe bus. For more information on using GPUDirect communication in your applications, please see:
- RDMA for GPUDirect Documentation Page
- CUDA C Programming Guide
- simpleP2P code sample in the and GPU Computing SDK code samples
GPUDirect support for RDMA is available now in the latest CUDA Toolkit vesion 6 or later. You may also need to contact your InfiniBand vendor and/or install updated drivers for adaptors using GPUDirect .Please use the links below or contact your InfiniBand vendor directly:
For the OFED Driver Image for Mellanox InfiniBand adapters, contact email@example.com
- For more information on GPUDirect RDMA, see:RDMA for GPUDirect documentation page
- How to build open MPI with CUDA aware support
- CUDA and GPUDirect RDMA support in open MPI
- Open MPI with RDMA support and CUDA, R vandeVaart, NVIDIA GTC 14: slides
- Support for GPUs with GPUDirect RDMA in MVAPICH2, DK Panda OSU, SC 13: slides
- Accelerating High Performance Computing with GPUDirect RDMA: slides
Blogs & Code Samples:
- Test Driving GPUDirect RDMA with MVAPICH2-GDR and Open MPI by Pak Liu, Mellanox
- An introduction to CUDA aware MPI, by Jiri Kraus, NVIDIA
- GDRCopy: A low-latency GPU memory copy library based on GPUDirect RDMA: Github link
Frequently Asked Questions
Q: My company makes network adaptors / storage devices. How do we enable our products for GPUDirect?
A: Please contact us for more information at firstname.lastname@example.org
Q: Where can I get more information about GPUDirect support for RDMA?
A: API documentation for Linux driver developers interested in integrating RDMA support is available in the CUDA Toolkit and online.