Abstract
The authors revisit the problem of software fault tolerance in distributed systems. In particular, we propose an extension of a message-driven confidence-driven (MDCD) protocol we have developed for error containment and recovery in a particular type of distributed embedded system. More specifically, we augment the original MDCD protocol by introducing the method of fine-grained confidence adjustment, which enables us to remove the architectural restrictions. The dynamic nature of the MDCD approach gives it a number of desirable characteristics. First, this approach does not impose any restrictions on interactions among application software components or require costly message-exchange based process coordination/synchronization. Second, the algorithms allow redundancies to be applied only to low-confidence or critical interacting software components in a distributed system, permitting flexible realization of software fault tolerance. Finally, the dynamic error containment and recovery mechanisms are transparent to the application and ready to be implemented by generic middleware.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.