Analyzing the performance of hierarchical collective algorithms on ARM-based multicore clusters

Gladys Utrera,Xavier Martorell,Marisa Gil

doi:10.1109/pdp55904.2022.00043

Gladys Utrera, Xavier Martorell + Show 1 more

Open Access

https://doi.org/10.1109/pdp55904.2022.00043

Copy DOI

Publication Date: Mar 1, 2022
Citations: 2	License type: other-oa

Affiliation: Universitat Politècnica de Catalunya

Abstract

MPI is the de facto communication standard library for parallel applications in distributed memory architectures. Collective operations performance is critical in HPC applications as they can become the bottleneck of their executions. The advent of larger node sizes on multicore clusters has motivated the exploration of hierarchical collective algorithms aware of the process placement in the cluster and the memory hierarchy. This work analyses and compares several hierarchical collective algorithms from the literature that do not form part of the current MPI standard. We implement the algorithms on top of OpenMPI using the shared-memory facility provided by MPI-3 at the intra-node level and evaluate them on ARM-based multicore clusters. From our results, we evidence aspects of the algorithms that impact the performance and applicability of the different algorithms. Finally, we propose a model that helps us to analyze the scalability of the algorithms.

Full Text