A Survey on Malleability Solutions for High-Performance Distributed Computing

Jose I Aliaga,Iker Martín-Álvarez,Rafael Mayo,Sergio Iserte,Maribel Castillo

doi:10.3390/app12105231

Jose I Aliaga, Iker Martín-Álvarez + Show 3 more

Open Access

https://doi.org/10.3390/app12105231

Copy DOI

Journal: Applied Sciences	Publication Date: May 22, 2022
Citations: 9	License type: CC BY 4.0

Affiliation: Jaume I University

Abstract

Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale supercomputers. Process malleability is presented as a straightforward mechanism to address that issue. Nowadays, the vast majority of HPC facilities are intended for distributed-memory applications based on the Message Passing (MP) paradigm. For this reason, many efforts are based on the Message Passing Interface (MPI), the de facto standard programming model. Malleability aims to rescale executions on-the-fly, in other words, reconfigure the number and layout of processes in running applications. Process malleability involves resources reallocation within the HPC system, handling processes of the application, and redistributing data among those processes to resume the execution. This manuscript compiles how different frameworks address process malleability, their main features, their integration in resource management systems, and how they may be used in user codes. This paper is a detailed state-of-the-art devised as an entry point for researchers who are interested in process malleability.

Full Text