Abstract

Communication exacerbates the performance for parallel applications with thousands of CPU cores and quantities of data to exchange. The high communication cost is usually attributed to the mismatch between the communication patterns of parallel applications and the physical topology graphs of the computing resources (or the underlying network topologies). The topology-aware process mapping method can usually obtain a better embedding scheme with the aim to improve communication performance. Many existing heuristic-search based mapping methods have high execution time for large-scale applications. Some low-cost graph-partitioning based mapping methods depend on that the allocated resources form a regular structure, which is usually impractical in most high performance computing systems shared by multiple users and applications. This weakens their performance. Other graph-partitioning based mapping methods come at a high cost or require users to provide the network structure information. To address these issues, a quadratic time complexity topology-aware process mapping method is presented in this paper. The experimental results show that the proposed method often achieves a better application communication performance than several state-of-the-art mapping methods on a shared HPC system, while maintaining a significantly lower execution cost. Moreover, the real-world scientific application proxies gain an execution time reduction as large as 14.60% in the 512 process-scale compared to the system default process placement on the TianHe-2 HPC systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call