Abstract

Scientific workflows consist of many interdependent tasks dictated by their data dependencies. As these workflows are becoming resource-intensive in both data and computing, private clouds struggle to cope with their resource requirements. Private cloud’s limitations are claimed to be addressed by public clouds, however, the complete offloading of workflow execution to public clouds causes excessive data transfer. In this paper, we address the problem of scheduling scientific workflows on hybrid clouds aiming at cost reduction while improving execution time, with the consideration of public cloud costs, by minimizing data movements between private cloud and public cloud. To this end, we develop Hybrid Scheduling for Hybrid Clouds (HSHC) as a novel data-locality aware scheduling algorithm. HSHC adopts a hybrid approach consisting of static phase and dynamic phase. The former solves the problem of workflow scheduling, using an extended genetic algorithm, with static information of workflows and resources. The latter dynamically adapts the static schedule in response to changing execution conditions, such as locality of intermediate output data and performance degradation of tasks and resources. We evaluate HSHC with both random workflows and real-world scientific applications in execution time and cost. Experimental results compared with seven state-of-the-art algorithms demonstrate HSHC significantly reduces the cost by up to 40% and improves the execution time by up to 25%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call