MVAPICH2 is an open source implementation of Message Passing Interface (MPI) that delivers the best performance, scalability and fault tolerance for high-end computing systems and servers using InfiniBand, 10GigE/iWARP and RoCE networking technologies. MVAPICH2 simplifies the task of porting MPI applications to run on clusters with NVIDIA GPUs by supporting standard MPI calls from GPU device memory. It optimizes the data movement between host and GPU, and between GPUs in the best way possible while requiring minimal or no effort from the application developer.
High performance RDMA-based inter-node MPI point-to-point communication from/to GPU device memory (GPU-GPU, GPU-Host and Host-GPU)
The latest performance results using MVAPICH2 for MPI communication from/to/between GPU devices can be found on the OSU Microbenchmark Page for GPUs
The latest version of MVAPICH2 can be downloaded from http://mvapich.cse.ohio-state.edu/register/ NVIDIA GPU related features are available in MVAPICH2 releases starting from 1.8.