Continuous checkpointing: joining the checkpointing with virtual memory paging

Shang-Te Hsu,Ruei-Chuan Chang

doi:10.1002/(sici)1097-024x(199709)27:9<1103::aid-spe130>3.0.co;2-2

Shang-Te Hsu, Ruei-Chuan Chang

https://doi.org/10.1002/(sici)1097-024x(199709)27:9<1103::aid-spe130>3.0.co;2-2

Copy DOI

Abstract

Checkpointing is a basic mechanism for backward error-recovery in fault-tolerant systems. A checkpointed process stops execution and saves its states to files periodically. To reduce the file sizes, only data modified between two consecutive checkpoint times is saved. However, existing approaches do not consider operating system paging activities; which, if ignored may double the number of disk accesses required to checkpoint non-resident dirty pages. In this paper, we propose continuous checkpointing, which combines the checkpoint facility with virtual memory paging operations. Thus, checkpointing is continuous during the lifetime of a process without extra overhead. Checkpoint size is no longer proportional to application size, but rather is bounded by resident dirty pages. Experimental results show that disk accesses can be reduced by about 80% when checkpointing large applications. © 1997 John Wiley & Sons, Ltd.

Full Text