Efficient checkpointing procedures for fault tolerant distributed systems

Kassem Saleh,Anjali Agarwal

doi:10.1016/0165-6074(94)90107-4

Efficient checkpointing procedures for fault tolerant distributed systems

Kassem Saleh, Anjali Agarwal

https://doi.org/10.1016/0165-6074(94)90107-4

Copy DOI

Journal: Journal of Systems Architecture	Publication Date: Jul 1, 1994
Citations: 1

Affiliation: Kuwait University, Concordia University

#Checkpointing Procedures #Consistent Checkpoint + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

A classical approach for achieving fault tolerance in distributed systems is based on the incorporation of efficient and fault tolerant procedures for checkpointing and recovery in such systems. We propose two checkpointing procedures, which can be initiated by any process in the system or upon failure of one or more component processes. Our procedures return the most recent and consistent checkpoints for the processes initiating the procedure, and do not interfere with the progress of the distributed system application. Furthermore, our procedures guarantee that a consistent checkpoint will be obtained when they terminate. Examples illustrating the application of the procedures are also provided.

Full Text