LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Jonas H Müller Korndörfer,Ahmed Eleliemy,Florina M Ciorba,Ali Mohammed

doi:10.1109/tpds.2021.3107775

Jonas H Müller Korndörfer, Ahmed Eleliemy + Show 2 more

Open Access

https://doi.org/10.1109/tpds.2021.3107775

Copy DOI

Abstract

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels. OpenMP is the most widely-used standard for expressing and exploiting the ever-increasing node-level parallelism. The scheduling options in OpenMP are insufficient to address the load imbalance that arises during the execution of multithreaded applications. The limited scheduling options in OpenMP hinder research on novel scheduling techniques which require comparison with others from the literature. This work introduces LB4OMP, an open-source dynamic load balancing library that implements successful scheduling algorithms from the literature. LB4OMP is a research infrastructure designed to spur and support present and future scheduling research, for the benefit of multithreaded applications performance. Through an extensive performance analysis campaign, we assess the effectiveness and demystify the performance of all loop scheduling techniques in the library. We show that, for numerous applications-systems pairs, the scheduling techniques in LB4OMP outperform the scheduling options in OpenMP. Node-level load balancing using LB4OMP leads to reduced cross-node load imbalance and to improved MPI+OpenMP applications performance, which is critical for Exascale computing.

Highlights

O N the road to Exascale, we observe that modern and future high performance computing (HPC) systems combine an increasing number of computing nodes and, in particular, cores per node
This work is significant by bridging the gap between the state-of-the-art and the state-of-the-practice of load balancing in multithreaded applications
Two microbenchmarks, and three computing node types to evaluate the performance of the existing in LLVM OpenMP runtime library (RTL) and newly implemented dynamic loop self-scheduling (DLS) techniques in LB4OMP

Summary

INTRODUCTION

O N the road to Exascale, we observe that modern and future high performance computing (HPC) systems combine an increasing number of computing nodes and, in particular, cores per node. Applications using LB4OMP benefit from improved performance due to the portfolio of DLS techniques, of which certain techniques adapt during execution to unpredictable variations in application and systemic characteristics (see Section 3.1) These 14 techniques are selected to cover a broad spectrum of dynamic (and adaptive) scheduling techniques. The novelty of this work lies in providing a standalone and unified implementation of efficient scheduling techniques from literature, which is needed to spur new research in scheduling and load balancing for Exascale systems. This work is significant by bridging the gap between the state-of-the-art and the state-of-the-practice of load balancing in multithreaded applications This will allow the large degrees of heterogeneous node-level parallelism in today’s pre- and upcoming Exascale systems to be efficiently exploited for improving applications performance.

RELATED WORK

THE LB4OMP LIBRARY

Dynamic Loop Scheduling Techniques

Features for Performance Measurement

Load Balancing Applications with LB4OMP

PERFORMANCE RESULTS AND DISCUSSION

Design of Factorial Experiments

Performance Analysis

Impact of Chunk Parameter Choice

Influence of Chunk Size Progression

CONCLUSIONS AND FUTURE WORK

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Apr 1, 2022
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Similar Papers

Design of robust scheduling methodologies for high performance computing

-

01 Jan 2019
01 Jan 2019

Two-level Dynamic Load Balancing for High Performance Scientific Applications
Ali Mohammed ... Florina M Ciorba
-
Ali Mohammed, et. al.Ali Mohammed ... Florina M Ciorba
01 Jan 2020
01 Jan 2020

An approach for realistically simulating the performance of scientific applications on high performance computing systems
Ali Mohammed ... Ioana Banicescu
Future Generation Computer Systems | VOL. 111
Ali Mohammed, et. al.Ali Mohammed ... Ioana Banicescu
25 Oct 2019
Future Generation Computer Systems | VOL. 111

ADVANCED SCHEDULER FOR COOPERATIVE EXECUTION OF THREADS ON MULTI-CORE SYSTEM
O N Karasik ... A A Prihozhy
«System analysis and applied information science» | VOL. -
O N Karasik, et. al.O N Karasik ... A A Prihozhy
04 May 2017
«System analysis and applied information science» | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems