Abstract

BackgroundAdvances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows.ImplementationHere, we present Watchdog 2.0, which implements new developments for module creation, reusability, and documentation and for reproducibility of analyses and workflow execution. Developments include a graphical user interface for semi-automatic module creation from software help pages, sharing repositories for modules and workflows, and a standardized module documentation format. The latter allows generation of a customized reference book of public and user-specific modules. Furthermore, extensive logging of workflow execution, module and software versions, and explicit support for package managers and container virtualization now ensures reproducibility of results. A step-by-step analysis protocol generated from the log file may, e.g., serve as a draft of a manuscript methods section. Finally, 2 new execution modes were implemented. One allows resuming workflow execution after interruption or modification without rerunning successfully executed tasks not affected by changes. The second one allows detaching and reattaching to workflow execution on a local computer while tasks continue running on computer clusters.ConclusionsWatchdog 2.0 provides several new developments that we believe to be of benefit for large-scale bioinformatics analysis and that are not completely covered by other competing workflow management systems. The software itself, module and workflow repositories, and comprehensive documentation are freely available at https://www.bio.ifi.lmu.de/watchdog.

Highlights

  • Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples

  • While the use of a workflow management system (WMS) already contributes to reproducibility, workflows may be modified between different runs of the workflow, e.g., by changing parameter values or including or excluding some steps, or the underlying software may be changed, e.g., by updates to a new version

  • We discuss how Watchdog 2.0 compares to these other WMSs regarding the new features we present in this article because these were not previously analyzed

Read more

Summary

Background

As a result of improvements in sequencing technologies, sequencing costs have decreased massively in recent years [1]. Using the GUI requires neither understanding of XML nor programming skills and allows easy construction of workflows from a pre-defined set of modules In this case, the only Watchdog syntax that has to be learned is how to reference variables. Watchdog already provides a helper script for creating the module XSD file and (optionally) a skeleton Bash script that only has to be extended by the program call This requires manually listing all parameters for the module. Documentation format Individual Watchdog modules are documented using a standardized XML format This contains general module information (e.g., author, description, dependencies) and properties of module parameters and return values (e.g., name, type, description). New workflows are available, e.g., for circular RNA detection with CIRI2 [22] and circRNA finder [23], ChIP-seq analysis using GEM (GEM, RRID:SCR 005339) followed by ChIPseeker [24,25], and download of public NGS data from the NCBI SRA (SRA, RRID: SCR 004891) [26] followed by alignment with HISAT2 (HISAT2, RRID:SCR 015530) [27]

Methods for ensuring reproducibility
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.