CHAMELEON: Reactive Load Balancing for Hybrid MPI+OpenMP Task-Parallel Applications

Jannis Klinkenberg,Philipp Samfass,Michael Bader,Christian Terboven,Matthias S Müller

doi:10.1016/j.jpdc.2019.12.005

Jannis Klinkenberg, Philipp Samfass + Show 3 more

Open Access

https://doi.org/10.1016/j.jpdc.2019.12.005

Copy DOI

Abstract

Many applications in high performance computing are designed based on underlying performance and execution models. While these models could successfully be employed in the past for balancing load within and between compute nodes, modern software and hardware increasingly make performance predictability difficult if not impossible. Consequently, balancing computational load becomes much more difficult. Aiming to tackle these challenges in search for a general solution, we present a novel library for fine-granular task-based reactive load balancing in distributed memory based on MPI and OpenMP. With our approach, individual migratable tasks can be executed on any MPI rank. The actual executing rank is determined at run time based on online performance data. We evaluate our approach under an enforced power cap and under enforced clock frequency changes for a synthetic benchmark and show its robustness for work-induced imbalances for a realistic application. Our experiments demonstrate speedups of up to 1.31X.

Highlights

Over the past decades, most scientific applications have been developed under the assumption of a homogeneous execution environment where every compute node – and even every single core – in a larger cloud or High Performance Computing (HPC) system has a constant equal speed
We present our library implementation based on MPI+OpenMP that allows an incremental integration into existing task-based applications with minimal programming efforts
All tests are conducted on the HPC production system of RWTH Aachen University CLAIX that is equipped with an Intel Omni-Path interconnect and dual-socket Intel Xeon E5-2650v4 processor nodes with a TDP of 105 W

Summary

Introduction

Most scientific applications have been developed under the assumption of a homogeneous execution environment where every compute node – and even every single core – in a larger cloud or High Performance Computing (HPC) system has a constant equal speed. Executing the same work on every node should require the same computation time In the past, this execution model was shown to be highly accurate and efficient for balancing computational load. This execution model was shown to be highly accurate and efficient for balancing computational load As both hardware and software become increasingly complex, this model might no longer be sufficient on current and future systems. CPU power efficiency variations arising from the manufacturing process can lead to performance variations in presence of an enforced power cap [12] Another source of dynamic variability stems from modern numerics in simulation applications such as particle simulations or iterative codes employing adaptive mesh refinement. Traditional approaches like global re-partitioning of work were an effective technique to ensure proper balance in the past They are based on a cost model to predict future execution time. This paper makes the following contributions: 1. We present the first conceptual generalization of reactive load balancing to arbitrary MPI-parallel task-based applications, detailing both requirements and limitations associated with it

We present our library implementation based on

Reactive load balancing

Smart decision-making

Hiding overhead

Generalization and modularity

Execution environment for migratable tasks

A migratable task paradigm

Communication infrastructure

Task execution and termination detection

Making effective load balancing decisions

Experimental evaluation

Robustness against hardware variations

Robustness against work-induced imbalances

Related work

Findings

Conclusion & future work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Parallel and Distributed Computing	Publication Date: Dec 16, 2019
Citations: 36	License type: cc-by

R Discovery Prime

R Discovery Prime

CHAMELEON: Reactive Load Balancing for Hybrid MPI+OpenMP Task-Parallel Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing

Lead the way for us

Similar Papers

Reactive Task Migration for Hybrid MPI+OpenMP Applications
Jannis Klinkenberg ... Christian Terboven
-
Jannis Klinkenberg, et. al.Jannis Klinkenberg ... Christian Terboven
01 Jan 2020
01 Jan 2020

Troubleshooting throughput bottlenecks using executable models
John A Zinky ... Joshua Etkin
Computer Networks and ISDN Systems | VOL. 24
John A Zinky, et. al.John A Zinky ... Joshua Etkin
01 Mar 1992
Computer Networks and ISDN Systems | VOL. 24

Predictive, reactive and replication-based load balancing of tasks in Chameleon and sam(oa) 2
Philipp Samfass ... Michael Bader
-
Philipp Samfass, et. al.Philipp Samfass ... Michael Bader
05 Jul 2021
05 Jul 2021

Chapter 4 - The CUDA Execution Model
-
CUDA Application Design and Development | VOL. -
--
01 Jan 2010
CUDA Application Design and Development | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CHAMELEON: Reactive Load Balancing for Hybrid MPI+OpenMP Task-Parallel Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing