Abstract

This paper considers a communication system which consists of many processors and studies the problem for improving its reliability by adopting the recovery techniques of checkpoint and rollback. When either processor failure or communication error has occurred, the rollback recovery for processors associated with such an event is executed to the most recent checkpoint, and so, a consistent state in the whole system is maintained. The stochastic model with the above recovery techniques is formulated, using the theory of Markov renewal processes. The mean time to take checkpoint and the expected numbers of rollback recovery caused by processor failures and communication errors are derived. Further, an optimal checkpointing interval which minimizes the expected cost is analytically discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call