A workflow is a systematic computation or a data-intensive application that has a regular computation and data access patterns. It is a key to design scalable scheduling algorithms in Cloud environments to address these runtime regularities effectively. While existing researches ignore to join the tasks scheduling and the optimization of data management for workflow, little attention has been paid so far to understand the combination between the two. The proposed scheme indicates that the coordination between task computation and data management can improve the scheduling performance.Our model considers data management to obtain satisfactory makespan on multiple datacenters. At the same time, our adaptive data-dependency analysis can reveal parallelization opportunities. In this paper, we introduce an adaptive data-aware scheduling (ADAS) strategy for workflow applications. It consist of a set-up stage which builds the clusters for the workflow tasks and datasets, and a run-time stage which makes the overlapped execution for the workflows. Through rigorous performance evaluation studies, we demonstrate that our strategy can effectively improve the workflow completion time and utilization of resources in a Cloud environment.
Read full abstract