Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.

Gloria M Sheynkman,Lloyd M Smith,Timothy J Griffin,Michael R Shortreed,James E Johnson,Pratik D Jagtap,Getiria Onsongo,Brian L Frey

doi:10.1186/1471-2164-15-703

Abstract

BackgroundCurrent practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data.ResultsWe present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.).ConclusionsUsing RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-703) contains supplementary material, which is available to authorized users.

Highlights

Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; this strategy necessarily fails to detect peptide and protein sequences that are absent from the database
These are databases containing novel single amino acid polymorphisms; databases containing novel splice junction sequences; and a reduced database, which only contains protein sequences with corresponding transcripts that are expressed over a threshold level of abundance
We demonstrated the utility of these workflows on parallel RNA-Seq and proteomics datasets collected from the same sample

Summary

Introduction

Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. High-throughput RNA sequencing has been used to empirically determine the transcript sequences expressed in a given sample, strain, cell line, or tissue, and has become accessible to many researchers [2,3] Taking advantage of this powerful new capability, we and others have developed novel strategies to leverage RNA-Seq for the detection of sample-specific protein variations [4,5,6,7,8,9,10,11]. Novel sequences discovered from RNASeq data are translated into proteins and added to the MS search database, which can be employed to detect the corresponding protein variations

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Aug 22, 2014
Citations: 110	License type: cc-by

R Discovery Prime

R Discovery Prime

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.
Bo Wen ... Xun Xu
BMC bioinformatics | VOL. 17
Bo Wen, et. al.Bo Wen ... Xun Xu
17 Jun 2016
BMC bioinformatics | VOL. 17

Large-Scale Mass Spectrometric Detection of Variant Peptides Resulting from Nonsynonymous Nucleotide Differences
Gloria M. Sheynkman ... Mark Scalf
Journal of Proteome Research | VOL. 13
Gloria M. Sheynkman, et. al.Gloria M. Sheynkman ... Mark Scalf
11 Nov 2013
Journal of Proteome Research | VOL. 13

An NGS-Independent Strategy for Proteome-Wide Identification of Single Amino Acid Polymorphisms by Mass Spectrometry.
Yun Xiong ... Shanshan Li
Analytical Chemistry | VOL. 88
Yun Xiong, et. al.Yun Xiong ... Shanshan Li
08 Feb 2016
Analytical Chemistry | VOL. 88

Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase
Yum L Yip ... Arnaud Gos
Human Mutation | VOL. 29
Yum L Yip, et. al.Yum L Yip ... Arnaud Gos
01 Jan 2008
Human Mutation | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics