Abstract

Scientific workflows stand as practical solutions useful for maintaining data intensive applications representation and execution purposes, which entail not only powerful computing resources, but also massive storage. With the emergence of cloud environment, which enhanced the execution of such applications, the study of workflow placement strategies, as targeted to effectively reduce data movements across data centers, has grown into a highly challenging objective. Given the fact that the workflow execution process is implemented in conformity with a task-execution order, and that each task may deal with either a single or multi-dataset, within a unique data center, various data partitioning or clustering methods have been devised in a bid to retrieve the most optimally effective workflow datasets’ distribution among data centers with the aim of remarkably reducing the datasets’ movements. In this work, a fuzzy data-dependencies based partitioning layer is implemented. More specifically, a dynamic massive data placement strategy is advanced through application of an Interval Type-2 Fuzzy C-Means technique. The latter is opted for as a means whereby the cluster related data centers can be rendered more consistent, thereby, making datasets rather closely associated in terms of related dependency, which helps in remarkably influencing the amounts of transferred data. The proposed strategy is evaluated by means of a simulation technique, using both random and real-world scientific workflows. The performed experiments appear to reveal well that our suggested strategy proves to outperform noticeably the relevant state-of-the-art methods, in that it noticeably helps in reducing the number of data movements across data centers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call