Abstract

One of the main advantages of using a scientific workflow management system (SWfMS) to orchestrate data flows among scientific activities is to control and register the whole workflow execution. The execution of activities within a workflow with high performance computing (HPC) presents challenges in SWfMS execution control. Current solutions leave the scheduling to the HPC queue system. Since the workflow execution engine does not run on remote clusters, SWfMS are not aware of the parallel strategy of the workflow execution. Consequently, remote execution control and provenance registry of the parallel activities is very limited from the SWfMS side. This work presents a set of components to be included on the workflow specification of any SWMfS to control parallelization of activities as MTC. In addition, these components can gather provenance data during remote workflow execution. Through these MTC components, the parallelization strategy can be registered and reused, and provenance data can be uniformly queried. We have evaluated our approach by performing parameter sweep parallelization in solving the incompressible 3D Navier-Stokes equations. Experimental results show the performance gains with the additional benefits of distributed provenance support.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call