Abstract
Proposes a hierarchical error detection framework for a software-implemented fault tolerance (SIFT) layer of a distributed system. A four-level error detection hierarchy is proposed in the context of Chameleon, a software environment for providing adaptive fault tolerance in an environment of commercial off-the-shelf (COTS) system components and software. The design and implementation of a software-based distributed signature monitoring scheme, which is central to the proposed four-level hierarchy, is described. Both intra-level and inter-level optimizations that minimize the overhead of detection and are capable of adapting to runtime requirements are proposed. The paper presents results from a prototype implementation of two levels of the error detection hierarchy and results of a detailed simulation of the overall environment. The results indicate a substantial increase in availability due to the detection framework and help in understanding the tradeoffs between overhead and coverage for different combinations of techniques.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have