Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform

Zain Ulabedin,Babar Nazir

doi:10.1007/s11227-020-03541-2

Abstract

Scientific workflow applications have a large amount of tasks and data sets to be processed in a systematic manner. These applications benefit from cloud computing platform that offer access to virtually limitless resources provisioned elastically and on demand. Running data-intensive scientific workflow on geographically distributed data centres faces massive amount of data transfer. That affects the whole execution time and monitory cost of scientific workflows. The existing efforts on scheduling workflow concentrate on decreasing make span and budget; little concern has been paid to contemplate tasks and data sets dependency. In this paper, we introduced workflow scheduling technique to overcome data transfer and execute workflow tasks within deadline and budget constraints. The proposed techniques consist of initial data placement stage, which clusters and distributes datasets based on their dependence and replication-based partial critical path (R-PCP) technique which schedules tasks with data locality and dynamically maintains dependency matrix for the placement of generated data sets. To reduce run time datasets movement, we use interdata centre tasks replication and data sets replication to make sure data sets availability. Simulation results with four workflow applications illustrate that our strategy efficiently reduces data movement and executes all chosen workflows within user specified budget and deadline. Results reveal that R-PCP has 44.93% and 31.37% less data movement compared to random and adaptive data-aware scheduling (ADAS) techniques, respectively. R-PCP has 26.48% less energy consumption compared with ADAS technique.

Full Text