Accelerating Parallel Applications Based on Graph Reordering for Random Network Topologies

Yao Hu

doi:10.1109/access.2023.3269793

Abstract

The Message Passing Interface (MPI) is a crucial programming tool for enabling communication between processes in parallel applications. The goal of MPI users is to allocate tasks to processors in a way that maximizes both spatial and temporal locality in the network. However, this can be challenging, especially in large-scale networks where maximizing processor locality may not be feasible at runtime. To address this issue, we propose the use of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Hamorder , an offline node reassignment approach that takes into account physical processor locations based on graph reordering for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Random network topologies. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Hamorder aims to optimize task mapping for improved performance in parallel applications, whether for multiple tasks or within a single task. Additionally, we investigate the potential of improving MPI applications through runtime parameter tuning based on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Hamorder . Our evaluation results show that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Hamorder provides a 27.3% improvement in performance compared to the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Gorder algorithm on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Random topologies, which is a state-of-the-art solution designed with the aim of enhancing cache locality and achieves this goal by rearranging the vertices of a graph in a way that places the vertices that are typically accessed together in close proximity. Moreover, our autotuning framework using <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Hamorder results in an average speedup of 1.38x for targeted MPI applications by searching through various runtime parameter combinations.

Full Text