Abstract

Proposes a hierarchical error detection framework for a software-implemented fault tolerance (SIFT) layer of a distributed system. A four-level error detection hierarchy is proposed in the context of Chameleon, a software environment for providing adaptive fault tolerance in an environment of commercial off-the-shelf (COTS) system components and software. The design and implementation of a software-based distributed signature monitoring scheme, which is central to the proposed four-level hierarchy, is described. Both intra-level and inter-level optimizations that minimize the overhead of detection and are capable of adapting to runtime requirements are proposed. The paper presents results from a prototype implementation of two levels of the error detection hierarchy and results of a detailed simulation of the overall environment. The results indicate a substantial increase in availability due to the detection framework and help in understanding the tradeoffs between overhead and coverage for different combinations of techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call