The cloud IaaS easily offers to have homogeneous multi-core machines (whether they are bare metal machines or virtual machines). On each of these machines, there can be high-performance input-output SSD disks. That allows to distribute the files produced during the execution of the workflow to different machines in order to minimize the additional costs associated with transferring these files. In this paper, we propose a scheduling algorithm called WSRDT (Workflow Scheduling Reducing Data Transfers) whose purpose is to minimize the makespan (execution time) of data-intensive workflows by reducing transfers data between dependent tasks on the network. Intermediate files produced by tasks are stored locally on the disk of the machine where the tasks were executed. We experimentally verify that the increase in the number of cores per machine reduces the additional cost due to data transfers on the network. Experiences with a veritable workflow show those advantages of the algorithms presented. Data-driven scheduling significantly reduces the execution time and the volume of data transferred on the network, our approach outperforms one of the best state-of-the-art algorithms that we have adapted with our hypotheses.
Read full abstract