Abstract

Scientists in each experiment team share their data and use distributed resources for conducting their experiments. These experiments are being accompanied in collaboration with teams that are globally dispersed. Scientific data need to be replicated or cached at distributed locations around the world. Data locality problem and transferred data overhead are important challenges for scheduling such data-intensive scientific workflow application in cloud computing. These applications are leading to the era of big data and task execution involves consuming and producing huge amount of input/output data with data dependencies among tasks. Scheduling and execution overhead are high when low performance of fine-grained tasks is a common problem in widely distributed platforms. Clustering Method based Task Dependency (CMTD) to reduce execution overhead and to improve the computational granularity of scientific workflow tasks is presented in this paper. And this paper proposes the data-intensive workflow scheduling system to minimize makespan of the data-intensive workflow applications, which can be modeled as a directed acyclic graph. Clustering method is validated by using simulation based analysis though WorkflowSim.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call