Abstract

Many fault tolerance techniques that are implemented via software are based on the use of process checkpointing and restore primitives. This is true both for methods used in system fault tolerance and for methods used in software fault tolerance, such as Recovery Blocks, but usually system and software fault tolerance appear to require different ad hoc primitives. Moreover, the use of checkpointing primitives within components implementing different kinds of fault tolerance should be coordinated, to save space and time. In this paper we present a unified interface for checkpointing and restore primitives, which is suitable both for software and for system fault tolerance in UNIX-type systems. We provide examples of the use of such primitives, including the use in a dedicated software component (the Recovery Meta Program) which may implement various techniques for fault tolerance. Finally, we discuss the implementation of the proposed primitives, and provide a comparison with some complementary approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.