Abstract

BackgroundMassive growth in the amount of research data and computational analysis has led to increased use of pipeline managers in biomedical computational research. However, each of the >100 such managers uses its own way to describe pipelines, leading to difficulty porting workflows to different environments and therefore poor reproducibility of computational studies. For this reason, the Common Workflow Language (CWL) was recently introduced as a specification for platform-independent workflow description, and work began to transition existing pipelines and workflow managers to CWL.FindingsHerein, we present CWL-Airflow, a package that adds support for CWL to the Apache Airflow pipeline manager. CWL-Airflow uses CWL version 1.0 specification and can run workflows on stand-alone MacOS/Linux servers, on clusters, or on a variety of cloud platforms. A sample CWL pipeline for processing of chromatin immunoprecipitation sequencing data is provided.ConclusionsCWL-Airflow will provide users with the features of a fully fledged pipeline manager and the ability to execute CWL workflows anywhere Airflow can run—from a laptop to a cluster or cloud environment. CWL-Airflow is available under Apache License, version 2.0 (Apache-2.0), and can be downloaded from https://barski-lab.github.io/cwl-airflow, https://scicrunch.org/resolver/RRID:SCR_017196.

Highlights

  • Massive growth in the amount of research data and computational analysis has led to increased utilization of pipeline managers in biomedical computational research

  • The Apache Airflow code is extended with a Python package that defines four basic classes— CWLStepOperator, JobDispatcher, JobCleanup, and CWLDAG

  • While periodically loading Directed Acyclic Graph (DAG) from the DAGs folder the Airflow scheduler runs the cwl_dag.py script and creates DAGs based on the available jobs and corresponding Common Workflow Language (CWL) workflow descriptor files

Read more

Summary

Introduction

Massive growth in the amount of research data and computational analysis has led to increased utilization of pipeline managers in biomedical computational research. Each of more than 100 such managers uses its own way to describe pipelines, leading to difficulty porting workflows to different environments and poor reproducibility of computational studies. Even when the tools are published, the lack of a precise description of the operating system environment and component software versions can lead to inaccurate reproduction of the analyses—or analyses failing altogether when executed in a different environment To ameliorate this situation, a team of researchers and software developers formed the Common Workflow Language (CWL) working group [3] with the intent of establishing a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments. Researchers using CWL are able to deposit descriptions of their tools and workflows into a repository (e.g., dockstore.org) upon publication, making their analyses reusable by others

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call