On the Effect of System Failures in Optimal Checkpointing Policies

Masanori Odagiri,Shunji Osaki,Tadashi Dohi,Naoto Kaio

doi:10.5687/iscie.9.41

Abstract

In a file system, checkpointing is inevitably necessary to recover the system once the system failure occurs. If the checkpointing is frequently carried out, the cost for generating checkpoints increases. On the other hand, if the number of checkpointing is small, the recovery cost via the journal increases when the system failure occurs. Thus, it is important to determine theoretically the optimal checkpoint sequence which minimizes the expected cost. In this paper we consider the situation in which all data files in a secondary storage can be destroyed by the media failure occurring at the checkpoint, and derive the optimal checkpointing policy. The fault tolerant design by taking such a catastrophic media failure into account is of great use to achieve the high reliability from the standpoint of risk avoidance.

Full Text