Abstract

I/O intensive jobs such as stage-in, stage-out or data clean-up jobs account for significant time in execution of scientific workflows. Workflow managers typically add these data management operations as supporting jobs to computational tasks with scheduling emphasis on compute jobs only. We present the integration of the Pegasus Workflow Management System with a Policy Based Data Placement Service (PDPS) to reduce overall workflow execution time. Pegasus delegates all data staging jobs to PDPS, which schedules and executes stage-in jobs based on selected data placement policies and simply executes stage-out and clean-up jobs independent of the workflow execution state. We measure the impact of using PDPS with Pegasus first with the Montage workflow, and then with a synthetic workflow. We enforce two policies and demonstrate the advantage of using PDPS for asynchronous data placement for scientific workflows. Our results show that the influence of PDPS on the overall workflow runtimes is dependent on the data characteristics of the executable workflow and the data placement policy being enforced.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call