Abstract

Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, executing the resulting automated workflows, and recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts. YesWorkflow tools extract and analyze these comments, represent the scripts in terms of entities based on the typical scientific workflow model, and provide graphical renderings of this workflow-like view of the scripts. Future version of YesWorkflow will also allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems.

Highlights

  • Many scientists use scripts or scientific workflow environments for data processing, analysis, model simulation, result visualization, and other scientific computing tasks

  • Future version of YesWorkflow will allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems

  • Prospective provenance is a description of the computational process itself; that is, the workflow specification is considered a form of provenance information, describing the method by which analysis results and other data products are obtained

Read more

Summary

Introduction

Many scientists use scripts (written in Python, R, or MATLAB, for example) or scientific workflow environments for data processing, analysis, model simulation, result visualization, and other scientific computing tasks. Scientific workflow systems naturally support both forms of provenance: prospective provenance by visually presenting a workflow as a directed graph with data and process steps, and retrospective provenance by capturing and subsequently exporting runtime provenance. Despite these and other advanced features of workflow systems, a vast number of computational workflows continue to be developed using general purpose or specialized scripting languages such as Python, R, and MATLAB.

YesWorkflow Model and Annotation Syntax
Alternative Workflow Views
Querying YesWorkflow Models
Prospective Data Provenance Queries
Inference of Retrospective Data Provenance
YesWorkflow Examples
Analysis of Gene Expression Microarray Data
Terrestrial Biospheric Modeling
Paleoclimate Reconstruction
YW Architecture
Related Work
Visualization of Nested Code Blocks
Functions and Function Calls
Interactive Graphs
Live Graph View
Distinguished Data and Parameters
Validation of Comments
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call