Abstract

Communication and topology-aware process mapping is a powerful approach to reduce communication time in parallel applications with known communication patterns on large, distributed memory systems. We address the problem as a quadratic assignment problem (QAP) and present algorithms to construct initial mappings of processes to processors and fast local search algorithms to further improve the mappings. By exploiting assumptions that typically hold for applications and modern supercomputer systems such as sparse communication patterns and hierarchically organized communication systems, we obtain significantly more powerful algorithms for these special QAPs. Our multilevel construction algorithms employ perfectly balanced graph partitioning techniques and exploit the given communication system hierarchy in significant ways. We present improvements to a local search algorithm of Brandfass et al. (2013) and further decrease the running time by reducing the time needed to perform swaps in the assignment as well as by carefully constraining local search neighborhoods. We also investigate different algorithms to create the communication graph that is mapped onto the processor network. Experiments indicate that our algorithms not only dramatically speed up local search but also, due to the multilevel approach, find much better solutions in practice.

Highlights

  • Communication performance between processes in high-performance systems depends on many factors

  • We focus on sparse communication patterns, and do not want to store the complete communication matrix but instead represent it more efficiently as a graph

  • Our experiments evaluate the objective of the quadratic assignment problem as well as the running time necessary to compute the solution

Read more

Summary

Introduction

Communication performance between processes in high-performance systems depends on many factors. Given the communication pattern between processes and a hardware topology description that reflects the quality of the communication links, one seeks to find a good mapping of processes onto processors such that pairs of processes exchanging large amounts of data are located closely. Such a mapping can be computed by solving a corresponding quadratic assignment problem (QAP) which is a hard optimization problem. We assume that the hardware communication topology under consideration is hierarchical with communication links on the same level in the hierarchy having the same communication speed This is typically observed in current high-performance systems, e. Experiments indicate that our algorithm drastically speeds up local search, but due to the multilevel approach that employs recently developed high quality partitioning techniques finds better solutions in practice

Preliminaries
Rank Reordering Algorithms
Initial Solutions
Faster Swapping
Alternative Local Search Spaces
Miscellanea
Experiments
Sparse Quadratic Assignment Problem
Speed-Up of Local Search
Local Search Neighborhoods
Initial Heuristics and Their Scaling Behaviour
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.