PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS

Franck Cappello,Henri Casanova,Yves Robert

doi:10.1142/s0129626411000126

Abstract

An alternative to classical fault-tolerant approaches for large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and a preventive measure is taken. We develop analytical performance models for two types of preventive measures: preventive checkpointing and preventive migration. We instantiate these models for platform scenarios representative of current and future technology trends. We find that preventive migration is the better approach in the short term by orders of magnitude. However, in the longer term, both approaches have comparable merit with a marginal advantage for preventive checkpointing. We also develop an analytical model of the performance for fault tolerance based on periodic checkpointing and compare this approach to both failure avoidance techniques. We find that this comparison is sensitive to the nature of the stochastic distribution of the time between failures, and that failure avoidance is likely inferior to fault tolerance in the long term. Regardless, our result show that each approach is likely to achieve poor utilization for large-scale platforms (e.g., 220 nodes) unless the mean time between failures is large. We show how bounding parallel job size improves utilization, but conclude that achieving good utilization in future large-scale platforms will require a combination of techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS

Abstract

Talk to us

Similar Papers

More From: Parallel Processing Letters

Lead the way for us

Journal: Parallel Processing Letters	Publication Date: Jun 1, 2011
Citations: 41

Similar Papers

Checkpointing vs. Migration for Post-Petascale Supercomputers
Franck Cappello ... Yves Robert
-
Franck Cappello, et. al.Franck Cappello ... Yves Robert
01 Sep 2010
01 Sep 2010

Future trends: businness view and high tech
José Rafael Marques Da Silva ... Manuela Correia
-
José Rafael Marques Da Silva, et. al.José Rafael Marques Da Silva ... Manuela Correia
01 Jan 2020
01 Jan 2020

The Status and Future Trends of Process Control Computer Technology in Japan
Yoshihiro Matsumoto
-
Yoshihiro MatsumotoYoshihiro Matsumoto
01 Jan 1984
01 Jan 1984

Estimation of Term Premiums from Average Yield Differentials in the Term Structure of Interest Rates
Charles R Nelson
Econometrica | VOL. 40
Charles R NelsonCharles R Nelson
01 Mar 1972
Econometrica | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS

Abstract

Talk to us

Similar Papers

More From: Parallel Processing Letters