Using prototyping to choose a bioinformatics workflow management system.

Michael Jackson,Edward W J Wallace,Kostas Kavoussanakis

doi:10.1371/journal.pcbi.1008622

Michael Jackson, Edward W J Wallace + Show 1 more

Open Access

https://doi.org/10.1371/journal.pcbi.1008622

Copy DOI

Journal: PLOS Computational Biology	Publication Date: Feb 25, 2021
Citations: 22	License type: CC BY 4.0

Affiliation: University of Edinburgh

Abstract

Workflow management systems represent, manage, and execute multistep computational analyses and offer many benefits to bioinformaticians. They provide a common language for describing analysis workflows, contributing to reproducibility and to building libraries of reusable components. They can support both incremental build and re-entrancy—the ability to selectively re-execute parts of a workflow in the presence of additional inputs or changes in configuration and to resume execution from where a workflow previously stopped. Many workflow management systems enhance portability by supporting the use of containers, high-performance computing (HPC) systems, and clouds. Most importantly, workflow management systems allow bioinformaticians to delegate how their workflows are run to the workflow management system and its developers. This frees the bioinformaticians to focus on what these workflows should do, on their data analyses, and on their science.RiboViz is a package to extract biological insight from ribosome profiling data to help advance understanding of protein synthesis. At the heart of RiboViz is an analysis workflow, implemented in a Python script. To conform to best practices for scientific computing which recommend the use of build tools to automate workflows and to reuse code instead of rewriting it, the authors reimplemented this workflow within a workflow management system. To select a workflow management system, a rapid survey of available systems was undertaken, and candidates were shortlisted: Snakemake, cwltool, Toil, and Nextflow. Each candidate was evaluated by quickly prototyping a subset of the RiboViz workflow, and Nextflow was chosen. The selection process took 10 person-days, a small cost for the assurance that Nextflow satisfied the authors’ requirements. The use of prototyping can offer a low-cost way of making a more informed selection of software to use within projects, rather than relying solely upon reviews and recommendations by others.

Highlights

Bioinformatics data analysis takes many steps, and a crucial but frustrating part of bioinformatics work is to run the right processing steps, in the right order, on the right data, reliably [1]
We describe the process that we used for selecting a workflow management system for our ribosome profiling software, RiboViz [5]
Our evaluation focused on quickly prototyping a subset of the RiboViz workflow into each system

Summary

Introduction

Bioinformatics data analysis takes many steps, and a crucial but frustrating part of bioinformatics work is to run the right processing steps, in the right order, on the right data, reliably [1]. These steps will involve disparate pieces of software from different sources, all run from the command line. Image analysis can involve many steps applied to large numbers of images. Success in these multistep data analyses generally requires writing a script to automate the steps. Bash scripts do not support re-entrancy or incremental build unless these functionalities are explicitly implemented by their authors, which can be a nontrivial development activity

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using prototyping to choose a bioinformatics workflow management system.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Criteria for the Evaluation of Workflow Management Systems for Scientific Data Analysis
Aleyna Dilan Kiran ... Mehmet Can Ay
Journal of Bioinformatics and Systems Biology | VOL. 06
Aleyna Dilan Kiran, et. al.Aleyna Dilan Kiran ... Mehmet Can Ay
01 Jan 2023
Journal of Bioinformatics and Systems Biology | VOL. 06

A characterization of workflow management systems for extreme-scale applications
Rafael Ferreira Da Silva ... Ewa Deelman
Future Generation Computer Systems | VOL. 75
Rafael Ferreira Da Silva, et. al.Rafael Ferreira Da Silva ... Ewa Deelman
16 Feb 2017
Future Generation Computer Systems | VOL. 75

INCF Workshop Report: New Perspectives on Workflows and Data Management for the Analysis of Electrophysiological Data
Denker Michael ... Davison Andrew
Frontiers in Neuroinformatics | VOL. 8
Denker Michael, et. al.Denker Michael ... Davison Andrew
01 Jan 2014
Frontiers in Neuroinformatics | VOL. 8

Design of robust scheduling methodologies for high performance computing

-

01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using prototyping to choose a bioinformatics workflow management system.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology