MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Hao Wang,Miao Luo,Sreeram Potluri,Sayantan Sur,Dhabaleswar K Panda,Ashish Kumar Singh

doi:10.1007/s00450-011-0171-3

Abstract

Data parallel architectures, such as General Purpose Graphics Units (GPGPUs) have seen a tremendous rise in their application for High End Computing. However, data movement in and out of GPGPUs remain the biggest hurdle to overall performance and programmer productivity. Applications executing on a cluster with GPUs have to manage data movement using CUDA in addition to MPI, the de-facto parallel programming standard. Currently, data movement with CUDA and MPI libraries is not integrated and it is not as efficient as possible. In addition, MPI-2 one sided communication does not work for windows in GPU memory, as there is no way to remotely get or put data from GPU memory in a one-sided manner. In this paper, we propose a novel MPI design that integrates CUDA data movement transparently with MPI. The programmer is presented with one MPI interface that can communicate to and from GPUs. Data movement from GPU and network can now be overlapped. The proposed design is incorporated into the MVAPICH2 library. To the best of our knowledge, this is the first work of its kind to enable advanced MPI features and optimized pipelining in a widely used MPI library. We observe up to 45% improvement in one-way latency. In addition, we show that collective communication performance can be improved significantly: 32%, 37% and 30% improvement for Scatter, Gather and Allotall collective operations, respectively. Further, we enable MPI-2 one sided communication with GPUs. We observe up to 45% improvement for Put and Get operations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Abstract

Talk to us

Similar Papers

More From: Computer Science - Research and Development

Lead the way for us

Journal: Computer Science - Research and Development	Publication Date: Apr 12, 2011
Citations: 114

Similar Papers

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2
Hao Wang ... Dhabaleswar K Panda
-
Hao Wang, et. al.Hao Wang ... Dhabaleswar K Panda
01 Sep 2011
01 Sep 2011

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit
Ashish Kumar Singh ... Sayantan Sur
-
Ashish Kumar Singh, et. al.Ashish Kumar Singh ... Sayantan Sur
01 Sep 2011
01 Sep 2011

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters
Rong Shi ... Davide Rossetti
-
Rong Shi, et. al.Rong Shi ... Davide Rossetti
01 Dec 2014
01 Dec 2014

Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication
S Potluri ... H Wang
-
S Potluri, et. al.S Potluri ... H Wang
01 May 2012
01 May 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Abstract

Talk to us

Similar Papers

More From: Computer Science - Research and Development