Abstract

Job checkpointing is a common technique for providing fault tolerance in grid computing environment. The effectiveness of checkpointing depends on the selection of the number of checkpoint interval and the duration of checkpoint interval. Fluctuating checkpointing interval can delay job execution. In this paper, a new fault-tolerant job scheduling algorithm based on checkpointing technique is presented and evaluated. While scheduling the job, the system uses RFOH to have both average failure time and failure rate of grid resources combined with resources latency to generate scheduling decisions. In RFOH, the system uses the tendency of failure of the previously assigned resources to calculate the checkpoint interval for each job. Various simulated experiments are conducted to quantify the performance of the proposed system in the grid simulator. Experiments have shown that the proposed system can considerably improve latency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call