Abstract

Checkpoint and rollback recovery is a technique that allows a system to tolerate a failure by per iodically sav­ ing the entir e state and if an error occurs, rolling back to the prior checkpoint. This technique zs partic ularly suited to applic ations with long execution times such as those typic ally found m supercomputer environments. This paper presents a technique that embeds the sup­ port for checkpoint and rollback rec overy dzrectly into the virtua l memory translation hardware. The scheme is general enough to be implemented on various scopes of data such as a portion of an address space, a sin­ gle address space or multiple address spaces. A basic model is developed which measures the amount of work required by the scheme as a function of the checkpoint interval Stze. Using this model the degree to which the overhead decreases as the interva l size increases is shown.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call