Abstract

By renting pay-as-you-go cloud resources (e.g., virtual machines) to do science, the data transfers required during the execution of data-intensive scientific workflows may be remarkably costly not only regarding the workflow execution time (makespan) but also regarding money. As such transfers are prone to delays, they may jeopardise the makespan, stretch the period of resource rentals and, as a result, compromise budgets. In this paper, we explore the possibility of trading some communication for computation during the scheduling production, aiming to schedule a workflow by duplicating some computation of its tasks on which other dependent-tasks critically depend upon to lessen communication between them. This paper explores this premise by enhancing the Heterogeneous Earliest Finish Time (HEFT) algorithm and the Lookahead variant of HEFT. The proposed approach is evaluated using simulation and synthetic data from four real-world scientific workflow applications. Our proposal, which is based on task duplication, can effectively reduce the size of data transfers, which, in turn, contributes to shortening the rental duration of the resources, in addition to minimising network traffic within the cloud.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call