An adaptive data placement strategy in scientific workflows over cloud computing environments

Heewon Kim,Yoonhee Kim

doi:10.1109/noms.2018.8406191

Abstract

Data of scientific workflow applications are tend to be distributed over many data centers to be effectively stored, retrieved, and transferred among them. The result of an experiment with those data shows diverse execution performance depending on the placement of input and intermediate data which are generated during application execution. However, initial data placement strategy would not be the best plan for long running experiments because of the dynamic change of resource condition time to time. We propose an adaptive data placement strategy considering dynamic resource change for efficient data-intensive applications. The strategy consists of two stages that group the datasets in data centers during the build- time stage and dynamically clusters every time newly generated datasets repeatedly to the most appropriate data centers during runtime stage, which is based on task dependency, intense degree of data usage, and just-in-time resource availability. Just-in-time data placement coming with task execution is more efficient than the one with initialization stage of experiments in the aspect of resource utilization. Experiments show that data movement can be effectively reduced while the workflow is running

Full Text