Abstract

The paper presents a deadlock free fault recovery algorithm for an entirely distributed system in which the messages do not need to arrive in the order they have been sent. The method is based on the asynchronous, atomic checkpointing of the sender and receiver of a message. Messages not balanced in the last permanent checkpoints are recorded in the new checkpoints. The fault recovery is based on: (a) repetition of all messages lost according to a record of unbalanced messages in the last permanent checkpoints, and on (b) undoing every message re-sent during the fault recovery, or undoing of a computation repeated according to a record of unbalanced messages in the last permanent checkpoints. A fault recovery involves only processes which communicated before a failure. A distributed computation may be split into a few segments without affecting transaction consistency. The algorithm involves the minimum number of messages. Proof of the resilience of the fault recovery algorithm is presented. >

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.