Abstract

The authors propose a hardware-supported scheme to facilitate fast checkpointing and failure recovery operations. The mechanism uses a small-sized bank of nonvolatile memory to save an incremental checkpoint for a processor so that the time overhead incurred by checkpointing can be reduced. Parity technique is employed to compress checkpointing information. An important feature of our scheme is that the checkpointing operation is dissociated from the parity update action. As a result, checkpointing latency is not affected by the speed of parity update activities, and thus is reduced. Moreover, it does not require atomic action for updating the parity data. Furthermore, our scheme allows each processor to initiate a checkpoint independently of others. Experimental results show that the overhead of our mechanism is small, and is not sensitive to the number of checkpoints taken by the processors. This observation suggests that the proposed hardware-supported scheme is promising for improving the performance of checkpoint/rollback-recovery systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.