Abstract
MotivationNumerous software utilities operating on mass spectrometry (MS) data are described in the literature and provide specific operations as building blocks for the assembly of on-purpose workflows. Working out which tools and combinations are applicable or optimal in practice is often hard. Thus researchers face difficulties in selecting practical and effective data analysis pipelines for a specific experimental design.ResultsWe provide a toolkit to support researchers in identifying, comparing and benchmarking multiple workflows from individual bioinformatics tools. Automated workflow composition is enabled by the tools’ semantic annotation in terms of the EDAM ontology. To demonstrate the practical use of our framework, we created and evaluated a number of logically and semantically equivalent workflows for four use cases representing frequent tasks in MS-based proteomics. Indeed we found that the results computed by the workflows could vary considerably, emphasizing the benefits of a framework that facilitates their systematic exploration.Availability and implementationThe project files and workflows are available from https://github.com/bio-tools/biotoolsCompose/tree/master/Automatic-Workflow-Composition.Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
Biological research today routinely involves the application of multiple, diverse computational methods in a sequence of operations to convert raw measurements into condensed results for biological interpretation
We explore the value of formalized semantic tool descriptions for guided construction of practical workflows for mass spectrometry (MS)-based proteomics
We have shown that the specification of operations, data types and formats enables the identification of compatible tools and composition of a set of tentatively viable workflows as permutations of a data analysis plan
Summary
Biological research today routinely involves the application of multiple, diverse computational methods in a sequence of operations to convert raw measurements into condensed results for biological interpretation. In the provision of these methods as application software, we discern two opposing paradigms. Integrated software packages provide the scientist with a convenient onestop-shop, with user-friendly but often limited functionality that usually is operated through a graphical user interface. The contrasting paradigm encapsulates one or a few closely related methods into discrete, stand-alone tools, enabling the expert user with a powerful command-line interface. Such tools excel as remixable components in automatic data analysis pipelines for high-throughput processing, VC The Author(s) 2018.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.