Abstract

Due to recent artificial intelligence (AI) technology progress, more and more applications present all-to-all, irregular or unpredictable communication patterns among compute nodes in high-performance computing (HPC) systems. Traditional communication infrastructures, e.g., torus or fat-tree interconnection networks, may not handle well their matchmaking problems with these newly emerging applications. For these typical non-random network topologies, there are already many communication-efficient application mapping algorithms. However, for the above unpredictable communication patterns, it is difficult to efficiently map their applications onto the non-random network topologies. In this case, a simple optimization is to map their applications with small diameter or average shortest path length (ASPL) among the assigned compute nodes. In this context, we recommend to use random network topologies as the communication infrastructures, which have drawn increasing attention for the use of HPC interconnects. In this study, we make a comparative study to analyze the performance impact of application mapping on non-random and random network topologies. We list several application mapping policies, and compare their job scheduling performances assuming that the communication patterns are unpredictable to the computing system. Evaluations with a large compound application workload show that, when compared to non-random topologies, random topologies can reduce the average turnaround time up to 39.3% by a random connected mapping method and up to 72.1% by a diameter/ASPL-based mapping method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call