Galaxy Integrated Omics: Web-based Standards-Compliant Workflows for Proteomics Informed by Transcriptomics*

Jun Fan,Shyamasree Saha,Gary Barker,Kate J Heesom,Fawaz Ghali,Andrew R Jones,David A Matthews,Conrad Bessant

doi:10.1074/mcp.o115.048777

Abstract

With the recent advent of RNA-seq technology the proteomics community has begun to generate sample-specific protein databases for peptide and protein identification, an approach we call proteomics informed by transcriptomics (PIT). This approach has gained a lot of interest, particularly among researchers who work with nonmodel organisms or with particularly dynamic proteomes such as those observed in developmental biology and host-pathogen studies. PIT has been shown to improve coverage of known proteins, and to reveal potential novel gene products. However, many groups are impeded in their use of PIT by the complexity of the required data analysis. Necessarily, this analysis requires complex integration of a number of different software tools from at least two different communities, and because PIT has a range of biological applications a single software pipeline is not suitable for all use cases. To overcome these problems, we have created GIO, a software system that uses the well-established Galaxy platform to make PIT analysis available to the typical bench scientist via a simple web interface. Within GIO we provide workflows for four common use cases: a standard search against a reference proteome; PIT protein identification without a reference genome; PIT protein identification using a genome guide; and PIT genome annotation. These workflows comprise individual tools that can be reconfigured and rearranged within the web interface to create new workflows to support additional use cases.

Highlights

From the ‡School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; §School of Cellular and Molecular Medicine, University of Bristol, University Walk, Bristol
We sought to mitigate these problems by developing a methodology called proteomics informed by transcriptomics (PIT) [5] in which sample-specific protein databases are generated from transcripts that have been identified in the same sample using RNA-seq [6]
In this paper we introduce our solution to these challenges—a publicly available standards-compatible system called GIO (Galaxy Integrated Omics) that uses the popular Galaxy platform [7] to make flexible PIT workflows available via an easy to use web interface

Summary

Introduction

From the ‡School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; §School of Cellular and Molecular Medicine, University of Bristol, University Walk, Bristol. We sought to mitigate these problems by developing a methodology called proteomics informed by transcriptomics (PIT) [5] in which sample-specific protein databases are generated from transcripts that have been identified in the same sample using RNA-seq [6]. These transcripts can be assembled from short reads either by mapping to a genome or entirely de novo (e.g. if no suitable genome exists). The GIO implementation of these workflows and validate the efficacy of this software using a matched RNA-seq and LC-MS/MS dataset

Objectives

Methods

Results

Conclusion