Abstract

The recovery mechanism from transient fault in distributed systems has been intensively studied in the past, but to our best knowledge, none of these studies has been devoted to cope together with transient and permanent hard faults. Our study devoted to recovery processes in a distributed environment in case of hard faults like transient or permanent. The recovery mechanism we presented can be based on one of the six proposed strategies involving checkpointing and message logging between distributed application processes. This exhaustive number is system-dependant. The strategies have been examined with respect to propagation recovery through processes in order to prevent the fastidious well known domino effect problem. The considered framework was a distributed system composed of a set of autonomous nodes running each one a local system; and some of them were predisposed to replace failing ones in case of permanent fault. Our main contribution was to enable a distributed application to meet its requirements of terminating its mission in spite of node crash. Preliminary experimental results of a fault tolerant mechanism based upon one of the proposed strategies demonstrated that our proposals seem to be conclusive.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.