Abstract

Incremental checkpoint mechanism was introduced to reduce high checkpoint overhead of regular (full) checkpointing, especially in high-performance computing systems. To gain an extra advantage from the incremental checkpoint technique, we propose an optimal checkpoint frequency function that globally minimizes the expected wasted time of the incremental checkpoint mechanism. Also, the re-computing time coefficient used to approximate the re-computing time is derived. Moreover, to reduce the complexity in the recovery state, full checkpoints are performed from time to time. In this paper we present an approach to evaluate the appropriate constant number of incremental checkpoints between two consecutive full checkpoints. Although the number of incremental checkpoints is constant, the checkpoint interval derived from the proposed model varies depending on the failure rate of the system. The checkpoint time is illustrated in the case of a Weibull distribution and can be easily simplified to the exponential case.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call