Abstract

We present an algorithm and a software architecture for a cloud-based system that executes cyclic scientific workflows whose structure may change during run time. Existing approaches either rely on workflow definitions based on directed acyclic graphs (DAGs) or require workarounds to implement cyclic structures. In contrast, our system supports cycles natively, avoids workarounds, and as such reduces the complexity of workflow modelling and maintenance. Our algorithm traverses workflow graphs and transforms them iteratively into linear sequences of executable actions. We call these sequences process chains. Our software architecture distributes the process chains to multiple compute nodes in the cloud and oversees their execution. We evaluate our approach by applying it to two practical use cases from the domains of astronomy and engineering. We also compare it with two existing workflow management systems. The evaluation demonstrates that our algorithm is able to execute dynamically changing workflows with cycles and that design and maintenance of complex workflows is easier than with existing solutions. It also shows that our software architecture can run process chains on multiple compute nodes in parallel to significantly speed up the workflow execution. An implementation of our algorithm and the software architecture is available with the Steep Workflow Management System that we released under an open-source license. The resources for the first practical use case are also available as open source for reproduction.

Highlights

  • Task automation has a long history in computer science

  • We present a software architecture for a scientific workflow management system and show how it can be deployed to the cloud

  • In this paper, we described an approach to scientific workflow management that supports complex, real-world scenarios where the structure of a workflow depends on the data to be processed and may change during run time

Read more

Summary

Introduction

With the continuing growth in global data, the need for automated data processing becomes more and more evident This applies to areas such as Bioinformatics [1], Geology [2], and Geoinformatics [3, 4] and Astronomy [5] (see “Use case 1: computing astronomical image mosaics” section) and Engineering (“Use case 2: shape optimisation via structural analysis” section). A scientific workflow is a model that describes such a transformation It is typically defined by a scientist using a directed acyclic graph (DAG).

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call