Using Checkpointing to Enhance Turnaround Time on Institutional Desktop Grids

Patricio Domingues,Artur Andrzejak,Luis Silva

doi:10.1109/e-science.2006.261157

Abstract

In this paper, we present a checkpoint-based scheme to improve the turnaround time of bag-of-tasks applications executed on institutional desktop grids. We propose to share checkpoints among desktop machines in order to reduce the negative impact of resource volatility. Several scheduling policies are evaluated in our study: FCFS, adaptive timeouts, simple replication, replication with checkpoint on demand, and prediction-based checkpointing combined with replication. We used a set of real traces collected from an academic desktop grid environment to perform trace-driven simulations of the proposed scheduling algorithms. The results show that using a shared checkpoint approach may considerably reduce the turnaround time of the applications when compared to the private checkpoints methodology.

Full Text