Perspectives on automated composition of workflows in the life sciences.

Anna‐Lena Lamprecht ,Robert Stevens,Magnus Palmblad,Aswin Verhoeven,Tobias Kuhn,Yolanda Gil,Hervé Ménager,Alireza Khanteymoori,Mohammad Sadnan Al Manir,Timothy J Griffin,Vedran Kasalica,Carole Goble,Suzan Verberne,Salvador Capella-Gutierrez,Matúš Kalaš,Steffen Möller,Veit Schwämmle,Vincent Robert,Michael R Crusoe,Paulos Charonyktakis,Paul Groth,Hans Ienasescu,Szániszló Szöke ,Jon C Ison ,Christopher J O Baker,Robin A Richardson,Pratik Jagtap,Ilkay Altıntaş ,Ammar Ben Hadj Amor,Stian Soiland‐Reyes ,Hailiang Mei,Katy Wolstencroft

doi:10.12688/f1000research.54159.1

Abstract

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.

Highlights

Computational pipelines, commonly referred to as scientific workflows*, play a key role in modern life science research.[1,2,3] Analyses must be tailored to highly complex biological data by successive application of different algorithms and routines to maximize biological insight
In highly collaborative community efforts like EDAM/bio.tools, OntoSoft and SADI, it is important to realize that the controlled vocabulary defined by the domain ontology constitutes a kind of social contract that all tool annotators must understand and respect
Despite similar ideas and efforts having struggled to find widespread application in the past, the attendees left the workshop with renewed confidence and optimism that we are at least considerably closer having clearly identified what development of community standards, ontologies and annotations is still needed to achieve broad adoption of automated workflow composition techniques across the life sciences

Summary

Introduction

Computational pipelines, commonly referred to as scientific workflows*, play a key role in modern life science research.[1,2,3] Analyses must be tailored to highly complex biological data by successive application of different algorithms and routines to maximize biological insight. EDAM is continually evolving based on input from the bioinformatics and, in particular, the bio.tools community It is for example well developed for the proteomics domain, due to recent work on (automated) workflow composition and benchmarking. A somewhat lower quality seems to be tolerable for assisted workflow composition, as the developer can correct or discard suggestions based on their domain knowledge This is the case, for example, when using a tool recommender system, like that in Galaxy, where the user can at any point decide whether or not to follow the recommendation. Semi-automated approaches like in APE and WINGS require higher-quality semantic annotations, but as the workflow developer still has the possibility to check and revise the workflow before execution, they can tolerate medium-quality annotations to some extent, Complete automation is possible for specific application areas or use cases with well-defined domain knowledge and high-quality annotations. Further metrics and criteria to base recommendations on are possible (such as a functional similarity index, compatibility, citation index or novelty), but in any case they should be made transparent to the user and create awareness for possible biases

Conclusion

Perkel JM

15. PROV-O

38. Gil Y: Workflow Composition

40. Hempel CG

48. Mainz IMB

66. Al Manir MS

68. Bioinformatics shims

72. Duigou T

84. Lamprecht A-L

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Sep 7, 2021
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Perspectives on automated composition of workflows in the life sciences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Perspectives on automated composition of workflows in the life sciences
Veit Schwämmle ... Robin A Richardson
F1000Research | VOL. 10
Veit Schwämmle, et. al.Veit Schwämmle ... Robin A Richardson
19 Oct 2021
F1000Research | VOL. 10

Safeguarding advances in the life sciences
Terence Taylor
EMBO reports | VOL. 7
Terence TaylorTerence Taylor
01 Jul 2006
EMBO reports | VOL. 7

How to Develop a Drug Target Ontology: KNowledge Acquisition and Representation Methodology (KNARM).
Hande Küçük Mcginty ... Ubbo Visser
Methods in molecular biology (Clifton, N.J.) | VOL. 1939
Hande Küçük Mcginty, et. al.Hande Küçük Mcginty ... Ubbo Visser
01 Jan 2019
Methods in molecular biology (Clifton, N.J.) | VOL. 1939

Thinking About NASA's Future
Michael Barratt
Science | VOL. 311
Michael BarrattMichael Barratt
27 Jan 2006
Science | VOL. 311

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Perspectives on automated composition of workflows in the life sciences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research