A Non-blocking Checkpointing Algorithm for Distributed Systems

Liu Guoliang ,Chen Shuyu ,Zhang Xiaoqin

doi:10.4156/jdcta.vol5.issue7.29

Liu Guoliang , Chen Shuyu + Show 1 more

https://doi.org/10.4156/jdcta.vol5.issue7.29

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The technology of checkpointing and rollback recovery as an effective method of fault tolerance, has been used widely on the parallel or distributed computer systems. We have presented a nonblocking coordinated checkpointing algorithm for distributed systems, which are differ from the conventional approach of taking first temporary checkpoints and then converting them to permanent ones by processes. The proposed checkpointing algorithm allows processes to take permanent checkpoints directly, without taking temporary checkpoints. The character of the algorithm contributes to its speed of execution. The orphan messages are eliminated by sender processes and the in-transit messages are eliminated by checkpointing interval and retransmission mechanism. While reducing the complexity of control message during gain checkpoints from O(n) to O(n), the algorithm’s controlling messages are reduced to n-1.

Full Text