Scalable Algorithms for MPI Intergroup Allgather and Allgatherv

Qiao Kang,Jesper Larsson Träff,Reda Al-Bahrani,Ankit Agrawal,Alok Choudhary,Wei-Keng Liao

doi:10.1016/j.parco.2019.04.015

Qiao Kang, Jesper Larsson Träff + Show 4 more

Open Access

https://doi.org/10.1016/j.parco.2019.04.015

Copy DOI

Journal: Parallel Computing	Publication Date: Apr 30, 2019
Citations: 6	License type: publisher-specific-oa

Affiliation: Northwestern University, TU Wien

Abstract

MPI intergroup collective communication defines message transfer patterns between two disjoint groups of MPI processes. Such patterns occur in coupled applications, and in modern scientific application workflows, mostly with large data sizes. However, current implementations in MPI production libraries adopt the “root gathering algorithm”, which does not achieve optimal communication transfer time. In this paper, we propose algorithms for the intergroup Allgather and Allgatherv communication operations under single-port communication constraints. We implement the new algorithms using MPI point-to-point and standard intra-communicator collective communication functions. We evaluate their performance on the Cori supercomputer at NERSC. Using message sizes per compute node ranging from 64KBytes to 8MBytes, our experiments show significant performance improvements of up to 23.67 times on 256 compute nodes compared with the implementations of production MPI libraries.

Full Text