Abstract

Continuous scaling-up of high-performance computing systems has brought challenges to the debugging and tuning of large-scale parallel programs. Firstly, to locate bugs in a program or tune its performance, programmer often needs to execute the program in a specified scale repeatedly, which consumes massive resources; secondly, due to the extensively used job scheduling systems, programmers can only submit their programs as jobs and cannot interact with them, which restricts debugging efficiency and flexibility. To address these challenges, this paper proposes an emulation system that supports debugging and tuning of large-scale parallel programs by executing parallel programs in the desired scale on a small cluster. The program is firstly executed in the desired scale on the target HPC system to record necessary information; then, programmers can choose and re-execute a subset of processes of the program repeatedly on a small cluster, during which the emulation system controls the execution of the processes, and programmers can debug their programs by attaching tools to the selected processes. Moreover, our system supports popular CPU+GPU heterogeneous architecture. The system is evaluated on a small cluster, while a 1000-node system is used as the target HPC system; experimental results demonstrate the accuracy and efficiency of emulation-execution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call