A server subject to random breakdowns and repairs offers services to incoming jobs whose lengths are highly variable. A checkpointing policy is in operation, aiming to protect against possibly lengthy recovery periods by backing up the current state at periodic checkpoints. The problem of how to choose a checkpointing interval to optimise performance is addressed by analysing a general queueing model which includes breakdowns, repairs, back-ups and recoveries. Exact solutions are obtained under both Markovian and non-Markovian assumptions. Numerical experiments illustrate the conditions where checkpoints are useful and where they are not, and, in the former case, quantify the achievable benefits.
Read full abstract