Abstract

Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing critical for exploiting parallelism. OpenMP applications can achieve high performance via careful selection of scheduling <monospace>kind</monospace> and <monospace>chunk</monospace> parameters on a per-loop, per-application, and per-system basis from a portfolio of advanced scheduling algorithms (Korndörfer <i>et al.</i> , 2022). This selection approach is time-consuming, challenging, and may need to change during execution. We propose <b>Auto4OMP</b> , a novel approach for automated load balancing of OpenMP applications. With Auto4OMP, we introduce three scheduling <i>algorithm selection methods</i> and an <i>expert-defined chunk parameter</i> for OpenMP's <monospace>schedule</monospace> clause's <monospace>kind</monospace> and <monospace>chunk</monospace> , respectively. Auto4OMP extends the OpenMP <monospace>schedule(auto)</monospace> and <i>chunk</i> parameter implementation in LLVM's OpenMP runtime library to automatically select a scheduling algorithm and calculate a chunk parameter during execution. Loop characteristics are inferred in Auto4OMP from the loop execution over the application's time-steps. The experiments performed in this work show that Auto4OMP improves applications performance by up to <inline-formula><tex-math notation="LaTeX">$11\%$</tex-math></inline-formula> compared to LLVM's <monospace>schedule(auto)</monospace> implementation and outperforms manual selection. Auto4OMP improves MPI+OpenMP applications performance by <i>explicitly</i> minimizing thread- and <i>implicitly</i> reducing process-load imbalance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call