Adaptive and Power-Aware Resilience for Extreme-scale Computing

Xiaolong Cui,Taieb Znati,Rami Melhem

doi:10.22606/fsp.2017.11004

Abstract

With concerted efforts from researchers in hardware, software, algorithm,, resource management, HPC is moving towards extreme-scale, featuring a computing capability of exaFLOPS. As we approach the new era of computing, however, several daunting scalability challenges remain to be conquered. Delivering extreme-scale performance will require a computing platform that supports billion-way parallelism, necessitating a dramatic increase in the number of computing, storage,, networking components. At such a large scale, failure would become a norm rather than an exception, driving the system to significantly lower efficiency with unprecedented amount of power consumption. To tackle these challenges, we propose an adaptive, power-aware algorithm, referred to as Lazy Shadowing, as an efficient, scalable approach to achieve high-levels of resilience, through forward progress, in extreme-scale, failure-prone computing environments. Lazy Shadowing associates with each process a (process) that executes at a reduced rate,, opportunistically rolls forward each shadow to catch up with its leading process during failure recovery. Compared to existing fault tolerance methods, our approach can achieve 20% energy saving with potential reduction in solution time at scale.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive and Power-Aware Resilience for Extreme-scale Computing

Abstract

Talk to us

Similar Papers

More From: Frontiers in Signal Processing

Lead the way for us

Similar Papers

Adaptive and Power-Aware Resilience for Extreme-Scale Computing
Xiaolong Cui ... Taieb Znati
-
Xiaolong Cui, et. al.Xiaolong Cui ... Taieb Znati
01 Jul 2016
01 Jul 2016

A Cam Clay constitutive relation for semi-analytical elasto-plastic modeling of wheel-soil interaction for fast applications
Amir-Hossein Rahimi ... Taha Goudarzi
Mechanics Based Design of Structures and Machines | VOL. ahead-of-print
Amir-Hossein Rahimi, et. al.Amir-Hossein Rahimi ... Taha Goudarzi
02 Nov 2024
Mechanics Based Design of Structures and Machines | VOL. ahead-of-print

A BMIA/FFT algorithm for the Monte Carlo simulations of large scale random rough surface scattering: application to grazing incidence
L Tsang ... H Sangani
-
L Tsang, et. al.L Tsang ... H Sangani
20 Jun 1994
20 Jun 1994

Modeling and Optimizing Pumped Storage in a Multi-stage Large Scale Electricity Market under Portfolio Evolution
Rui Bo ... Lei Wu
-
Rui Bo, et. al.Rui Bo ... Lei Wu
29 Nov 2021
29 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive and Power-Aware Resilience for Extreme-scale Computing

Abstract

Talk to us

Similar Papers

More From: Frontiers in Signal Processing