Whether you are exploring mountains of geological data, researching solutions to complex scientific problems, training neural networks, or racing to model fast-moving financial markets, you need a computing platform that provides the highest throughput and lowest latency possible. GPUs are widely recognized for providing the tremendous horsepower required by compute-intensive workloads. However, GPUs consume data much faster than CPUs and as the computing horsepower of GPU increases so does the demand for IO bandwidth.
Using GPUDirect, multiple GPUs, network adapters, solid-state drives (SSDs) and now NVMe drives can directly read and write CUDA host and device memory, eliminating unnecessary memory copies, dramatically lowering CPU overhead, and reducing latency, resulting in significant performance improvements in data transfer times for applications running on NVIDIA Tesla™ and Quadro™ products
Innovations in GPUDirect
GPUDirect Storage enables a direct path to transfer data between GPU memory and storage devices, like NVMe or NVMe-oF.
GPUDirect RDMA (remote direct memory access) enables network devices to directly access GPU memory, bypassing CPU host memory altogether.
GPUDirect for Video offers an optimized pipeline for frame-based devices such as frame grabbers, video switchers, HD-SDI capture, and CameraLink devices to efficiently transfer video frames in and out of NVIDIA GPU memory.
GPUDirect Peer to Peer allows GPUs to use high-speed DMA transfers to directly load and store data between the memories of two GPUs.
GPUDirect Shared Access provided support for accelerated communication with third party PCI Express device drivers via shared pinned host memory (Deprecated).
For more information, see the GPUDirect Technology Overview.
GPUDirect Storage is in development with NDA partners and will be available to application developers in a future CUDA Toolkit version. As a member of the NVIDIA developer program, if you would like to be notified when we share additional information please fill out this form.
GPUDirect RDMA is available in the CUDA Toolkit version 6 or later. You may also need to contact your InfiniBand or iWARP vendor and/or install updated drivers for adapters using GPUDirect. Please use the links below or contact your InfiniBand or iWARP vendor directly:
- For the OFED Driver Image for Mellanox InfiniBand adapters, contact firstname.lastname@example.org
- For the OFED Driver Image for Chelsio iWARP adapters, contact email@example.com
GPUDirect Peer to Peer is supported natively by the CUDA Driver. Developers should use the latest CUDA Toolkit and drivers on a system with two or more compatible devices . For more information, please see:
- RDMA for GPUDirect documentation
- How to build open MPI with CUDA aware support
- CUDA and GPUDirect RDMA support in OpenMPI
GTC 2019 Sessions:
- Support for GPUs with GPUDirect RDMA in MVAPICH2, DK Panda OSU, GTC 2019.
- Accelerating High Performance Computing with GPUDirect RDMA.
- Test Driving GPUDirect RDMA with MVAPICH2-GDR and Open MPI by Pak Liu, Mellanox
- Mellanox OFED GPUDirect RDMA plug-in kernel module
- An introduction to CUDA aware MPI, by Jiri Kraus, NVIDIA
- GDRCopy: A low-latency GPU memory copy library based on GPUDirect RDMA: Github link
Frequently Asked Questions
Q: My company makes network adaptors / storage devices. How do we enable our products for GPUDirect?
A: Please contact us for more information at firstname.lastname@example.org
Q: Where can I get more information about GPUDirect support for RDMA?
A: API documentation for Linux driver developers interested in integrating RDMA support is available in the CUDA Toolkit and online.
Q. Where can I get more information about GPUDirect Storage?
A. Members of the NVIDIA developer program can sign up here to be notified when we have more information to share.