Abstract

Checkpointing is an effective measure to ensure the completion of long-running jobs in Desktop Grids which are subject to frequent resource failures. We focus on checkpointing strategies in the context of Desktop Grids, including volunteer computing systems, where individual hosts follow diverse failure distributions. We propose an algorithm which computes sequence of checkpoint interval lengths for each individual host according to a sample of its availability interval lengths. This algorithm directly approximates the probability distribution of availability interval lengths with the sample, without deriving a closed form of the probability distribution. Through simulations with synthetic trace data and trace data from real volunteer computing project, this sample based strategy shows better performance than periodic strategy in terms of wasted time in most cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call