Abstract

It is essential for the advancement of science that researchers share, reuse and reproduce each other’s workflows and protocols. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize the importance of making digital objects findable and reusable by others. The question of how to apply these principles not just to data but also to the workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe a two-fold approach of simultaneously applying the FAIR principles to scientific workflows as well as the involved data. We apply and evaluate our approach on the case of the PREDICT workflow, a highly cited drug repurposing workflow. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. We propose a semantic model to address these specific requirements and was evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.

Highlights

  • Reproducible results are one of the main goals of science

  • We evaluate our approach below with a case study of a computational drug repurposing method based on machine learning, called PREDICT (Gottlieb et al, 2011)

  • FAIR applies to the datasets that a scientific workflow consumes and produces, e.g., the protein–protein interactions dataset used by OpenPREDICT, and we need FAIRification approaches to raise existing datsets to this standard

Read more

Summary

INTRODUCTION

Reproducible results are one of the main goals of science. A recent survey, showed that more than 70% of researchers have been unsuccessful in reproducing another research experiment and more than 50% failed to reproduce their own research studies (Baker, 2016). We evaluate our method by applying it to the PREDICT drug repositioning workflow Based on this example, we will try to answer our research question of how we can use existing vocabularies and techniques to make scientific workflows more open and FAIR, with a particular focus on the interoperability aspect. The main contributions of this paper are (a) general guidelines to make scientific workflows open and FAIR, focusing on the interoperability aspect, (b) the OpenPREDICT use case, demonstrating the open and FAIR version of the PREDICT workflow, (c) new competency questions for previously unaddressed reproducibility requirements, and (d) evaluation results on the practicality and usefulness of our approach

BACKGROUND
Feature Generation
Model Evaluation
Findings
DISCUSSION
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call