Abstract

If the variables used for a checkpointing algorithm have data faults, the existing checkpointing and recovery algorithms may fail. In this paper, self-stabilizing data fault detecting and correcting, checkpointing, and recovery algorithms are proposed in a ring topology. The proposed data fault detection and correction algorithms can handle data faults; at most one per process, but in any number of processes. The proposed checkpointing algorithm can deal with concurrent multiple initiations of checkpointing and data faults. A process can recover from a fault, using the proposed recovery algorithm in spite of multiple data faults present in the system. All the proposed algorithms converge in O ( n ) steps, where n is the number of processes. The algorithm can be extended to work for general topologies too.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call