Abstract

Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different 'omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.

Highlights

  • The Galaxy platform[1] offers a highly flexible bioinformatics workbench in which disparate software tools can be deployed and integrated into sophisticated workflows

  • Leveraging a SQLite database, and utilizing regular expressions, the tool can minimize the need for lengthy workflows using conventional Galaxy-based text manipulation tools

  • We have provided use-case examples in the area of multi-omics demonstrating the value of Query Tabular in this way

Read more

Summary

Introduction

The Galaxy platform[1] offers a highly flexible bioinformatics workbench in which disparate software tools can be deployed and integrated into sophisticated workflows. These workflows contain many steps and different software tools, with many different types of outputs. The results outputted from a software tool are in the form of a tabular file, which serve as input to a subsequent tool in the workflow. To enable compatibility between the software tools composing a proteogenomics workflow, tabular files often must be manipulated into appropriate formats recognized by specific tools Another example is Galaxy workflows for metaproteomics[5,6], a multi-omics analysis which requires text manipulations in workflows integrating metagenomic, MS-based proteomics and other functional and taxonomic software tools. Query Tabular is available through the Galaxy Tool Shed and should prove highly useful to a broad community of Galaxy users

Methods
Conclusions
Findings
10. Nesvizhskii AI
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call