Automated Scheduling Algorithm Selection and Chunk Parameter Calculation in OpenMP

Ali Mohammed,Florina M Ciorba,Jonas H Muller Korndorfer,Ahmed Eleliemy

doi:10.1109/tpds.2022.3189270

Abstract

Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing critical for exploiting parallelism. OpenMP applications can achieve high performance via careful selection of scheduling <monospace>kind</monospace> and <monospace>chunk</monospace> parameters on a per-loop, per-application, and per-system basis from a portfolio of advanced scheduling algorithms (Korndörfer et al. , 2022). This selection approach is time-consuming, challenging, and may need to change during execution. We propose Auto4OMP , a novel approach for automated load balancing of OpenMP applications. With Auto4OMP, we introduce three scheduling algorithm selection methods and an expert-defined chunk parameter for OpenMP's <monospace>schedule</monospace> clause's <monospace>kind</monospace> and <monospace>chunk</monospace> , respectively. Auto4OMP extends the OpenMP <monospace>schedule(auto)</monospace> and chunk parameter implementation in LLVM's OpenMP runtime library to automatically select a scheduling algorithm and calculate a chunk parameter during execution. Loop characteristics are inferred in Auto4OMP from the loop execution over the application's time-steps. The experiments performed in this work show that Auto4OMP improves applications performance by up to <inline-formula><tex-math notation="LaTeX">$11\%$</tex-math></inline-formula> compared to LLVM's <monospace>schedule(auto)</monospace> implementation and outperforms manual selection. Auto4OMP improves MPI+OpenMP applications performance by explicitly minimizing thread- and implicitly reducing process-load imbalance.

Full Text