As an emerging business computing model, cloud computing needs to deal with the scientific workflow submitted by user groups. How to efficiently schedule massive tasks of scientific workflow is an important problem in cloud computing. In order to minimize the total execution time of workflow, reduce the consume of cloud resources, reduce execution costs of users, a new task scheduling algorithm based on task duplication and task grouping is proposed in this paper. The new algorithm is composed of four steps. Firstly, the join nodes are duplicated, a DAG is converted into an in-tree graph, then all tasks are divide into task groups, it reduces communication overhead between tasks; then some task groups are merged by utilizing the idle time between tasks in a task group, it reduces the use of the processors; lastly, Assign the tasks to processors by making full use of the idle time of the processors, it increases resource utilization. The new algorithm is compared with TDS and TDCS by simulation platform CloudSim. The performance indicators for comparison include makespan of workflow, the number of used processors and resource utilization. The experiment results show that the new algorithm has a smaller makespan of workflow, fewer processors are used, and has higher resource utilization for both compute-intensive and data-intensive workflow, especially for data-intensive workflow, the new algorithm has obvious advantages on the three performance indicators.
Read full abstract