Scheduling Scientific Workflows on Clouds Using a Task Duplication Approach

Thiago Augusto Lopes Genez,Rizos Sakellariou,Torsten Braun,Luiz Fernando Bittencourt,Edmundo Roberto Mauro Madeira

doi:10.1109/ucc.2018.00017

Abstract

By renting pay-as-you-go cloud resources (e.g., virtual machines) to do science, the data transfers required during the execution of data-intensive scientific workflows may be remarkably costly not only regarding the workflow execution time (makespan) but also regarding money. As such transfers are prone to delays, they may jeopardise the makespan, stretch the period of resource rentals and, as a result, compromise budgets. In this paper, we explore the possibility of trading some communication for computation during the scheduling production, aiming to schedule a workflow by duplicating some computation of its tasks on which other dependent-tasks critically depend upon to lessen communication between them. This paper explores this premise by enhancing the Heterogeneous Earliest Finish Time (HEFT) algorithm and the Lookahead variant of HEFT. The proposed approach is evaluated using simulation and synthetic data from four real-world scientific workflow applications. Our proposal, which is based on task duplication, can effectively reduce the size of data transfers, which, in turn, contributes to shortening the rental duration of the resources, in addition to minimising network traffic within the cloud.

Full Text