Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization

Yu Pei,Akihiro Ida,Jack Dongarra,Ichitaro Yamazaki,George Bosilca

doi:10.1109/paw-atm49560.2019.00008

Abstract

To minimize data movement, many parallel ap- plications statically distribute computational tasks among the processes. However, modern simulations often encounters ir- regular computational tasks whose computational loads change dynamically at runtime or are data dependent. As a result, load imbalance among the processes at each step of simulation is a natural situation that must be dealt with at the programming level. The de facto parallel programming approach, flat MPI (one process per core), is hardly suitable to manage the lack of balance, imposing significant idle time on the simulation as processes have to wait for the slowest process at each step of simulation. One critical application for many domains is the LU factor- ization of a large dense matrix stored in the Block Low-Rank (BLR) format. Using the low-rank format can significantly reduce the cost of factorization in many scientific applications, including the boundary element analysis of electrostatic field. However, the partitioning of the matrix based on underlying geometry leads to different sizes of the matrix blocks whose numerical ranks change at each step of factorization, leading to the load imbalance among the processes at each step of factorization. We use BLR LU factorization as a test case to study the programmability and performance of five different programming approaches: (1) flat MPI, (2) Adaptive MPI (Charm++), (3) MPI + OpenMP, (4) parameterized task graph (PTG), and (5) dynamic task discovery (DTD). The last two versions use a task-based paradigm to express the algorithm; we rely on the PaRSEC run- time system to execute the tasks. We first point out programming features needed to efficiently solve this category of problems, hinting at possible alternatives to the MPI+X programming paradigm. We then evaluate the programmability of the different approaches, detailing our experience implementing the algorithm using each of the models. Finally, we show the performance result on the Intel Haswell–based Bridges system at the Pittsburgh Supercomputing Center (PSC) and analyze the effectiveness of the implementations to address the load imbalance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Nov 1, 2019
Citations: 22	License type: other-oa

Similar Papers

Bridging the Gap Between Flat and Hierarchical Low-Rank Matrix Formats: The Multilevel Block Low-Rank Format
Patrick R Amestoy ... Theo A Mary
SIAM Journal on Scientific Computing | VOL. 41
Patrick R Amestoy, et. al.Patrick R Amestoy ... Theo A Mary
01 Jan 2019
SIAM Journal on Scientific Computing | VOL. 41

On the Complexity of the Block Low-Rank Multifrontal Factorization
Patrick Amestoy ... Theo Mary
SIAM Journal on Scientific Computing | VOL. 39
Patrick Amestoy, et. al.Patrick Amestoy ... Theo Mary
01 Jan 2017
SIAM Journal on Scientific Computing | VOL. 39

2D Static Resource Allocation for Compressed Linear Algebra and Communication Constraints
Olivier Beaumont ... Lionel Eyraud-Dubois
-
Olivier Beaumont, et. al.Olivier Beaumont ... Lionel Eyraud-Dubois
01 Dec 2020
01 Dec 2020

Multi-Level Load Balancing with an Integrated Runtime Approach
Seonmyeong Bak ... Harshitha Menon
-
Seonmyeong Bak, et. al.Seonmyeong Bak ... Harshitha Menon
01 May 2018
01 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization

Abstract

Talk to us

Similar Papers