Abstract

Scientific workflows are a popular mechanism for specifying and automating data-driven in silico experiments. A significant aspect of their value lies in their potential to be reused. Once shared, workflows become useful building blocks that can be combined or modified for developing new experiments. However, previous studies have shown that storing workflow specifications alone is not sufficient to ensure that they can be successfully reused, without being able to understand what the workflows aim to achieve or to re-enact them. To gain an understanding of the workflow, and how it may be used and repurposed for their needs, scientists require access to additional resources such as annotations describing the workflow, datasets used and produced by the workflow, and provenance traces recording workflow executions.In this article, we present a novel approach to the preservation of scientific workflows through the application of research objects—aggregations of data and metadata that enrich the workflow specifications. Our approach is realised as a suite of ontologies that support the creation of workflow-centric research objects. Their design was guided by requirements elicited from previous empirical analyses of workflow decay and repair. The ontologies developed make use of and extend existing well known ontologies, namely the Object Reuse and Exchange (ORE) vocabulary, the Annotation Ontology (AO) and the W3C PROV ontology (PROVO). We illustrate the application of the ontologies for building Workflow Research Objects with a case-study that investigates Huntington’s disease, performed in collaboration with a team from the Leiden University Medial Centre (HG-LUMC). Finally we present a number of tools developed for creating and managing workflow-centric research objects.

Highlights

  • As science becomes increasingly data driven, many scientists have adopted workflows as a means to specify and automate repetitive experiments that retrieve, integrate, and analyse datasets using distributed resources [1]

  • – We present a collection of tools that make use of those ontologies in the support and management of Workflow Research Objects. – we present a series of competency queries that demonstrate how Workflow Research Objects support workflow preservation

  • We have presented in this paper a novel approach to scientific workflow preservation that makes use of a suite of ontologies for specifying Workflow Research Objects

Read more

Summary

Introduction

As science becomes increasingly data driven, many scientists have adopted workflows as a means to specify and automate repetitive experiments that retrieve, integrate, and analyse datasets using distributed resources [1]. Whilst the loss of 3rd party services is out of the control of original authors, there are a number of approaches to remedy this type of workflow decay by making use of metadata – such as additional semantic descriptions about the services used [5], or provenance information [6,7,8] – all of which can be either provided by the author of the workflow or automatically tracked and computed In light of this we propose a novel approach to workflow preservation where workflow specifications are not published in isolation, but are instead accompanied by auxiliary resources and additional metadata. Our implementation of workflow-centric research objects is realised as a series of ontologies that support both a core model of aggregation and the domain specific workflow preservation requirements. The resources used in the paper are available online, and the ontologies are documented online [10]

Requirements
Workflows
Creating a workflow research object
Workflow Research Object ontologies
Specifying workflows using wfdesc
Describing workflow runs using wfprov
Describing aggregations using the ro ontology
Tracking research object evolution using the roevo ontology
The Workflow Research Object family of tools
The Research Object Manager
Example competency queries
Scientific workflow preservation
Scientific investigation preservation and packaging
Representation of packaging structure
Conclusions
Results of the example competency queries
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call