Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq

Gloria M Sheynkman,Michael R Shortreed,Brian L Frey,Lloyd M Smith

doi:10.1074/mcp.o113.028142

Gloria M Sheynkman, Michael R Shortreed + Show 2 more

Open Access

https://doi.org/10.1074/mcp.o113.028142

Copy DOI

Abstract

Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.

Highlights

Mass spectrometry-based proteomics relies on accurate databases to identify and quantify proteins, including those derived from splice variants, indels, and single nucleotide variants (SNVs)1 [1]
The abbreviations used are: SNV, single nucleotide variant; cDNA, complementary DNA; FASP, filter aided sample preparation; GENCODE, component of the ENCODE project that aims to build accurate human reference annotations; GTF, gene annotation file; ppm, parts per million; Percentage Spliced In” (PSI), percentage spliced in; RNA-Seq, RNA Sequencing; RSEM, RNA-Seq Expectation Maximization; SDS and DTT-based buffer (SDT), Buffer used in FASP protocol containing SDS and dithiothreitol; TPM, transcripts per million; XCorr, SEQUEST cross-correlation score
The most common splicing events were small insertions and deletions occurring at the 3Ј acceptor exons, frequently characterized by the NAGNAG motifs where two AG dinucleotide splice site acceptors sit in close proximity to each other: this agrees with recent gene validation efforts of the GENCODE gene annotation project in which mass spectrometry data retrieved from the Global Proteome Machine (GPM) and PeptideAtlas were aligned to GENCODE gene models to assess the number of translated products [17]

Summary

Technological Innovation and Resources

Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq*□S. Though the focus of this paper is on the study of alternative splice junctions, other bioinformatics strategies to extract information from RNA-Seq data have been employed to create customized mass spectrometry databases These include reducing a database to only include sequences with transcript expression evidence [40], including fusion or chimeric sequences [44], incorporating nonsynonymous single nucleotide polymorphism (SNP) or SNV sequences [40], and, for non-model systems, building a proteomic database from de novo assembled transcripts [45, 46]. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database using appropriately stringent MS search parameters and post-processing steps, including the use of a conservative 1% local false discovery rate and manual validation of junction peptide MS2 spectra To our knowledge this is the first example of using sample-specific RNA-Seq data to discover new peptides resulting from alternative splicing

EXPERIMENTAL PROCEDURES

Jurkat Cells

RESULTS

Relative Abundance

Within intron

DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular & Cellular Proteomics	Publication Date: Aug 1, 2013
Citations: 120	License type: cc-by

R Discovery Prime

R Discovery Prime

Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular & Cellular Proteomics

Lead the way for us

Similar Papers

Alternative Splicing: New Insights from Global Analyses
Benjamin J Blencowe
Cell | VOL. 126
Benjamin J BlencoweBenjamin J Blencowe
01 Jul 2006
Cell | VOL. 126

Dominant Negative Isoform of Rat Norepinephrine Transporter Produced by Alternative RNA Splicing
Shigeo Kitayama ... Toshihiro Dohi
Journal of Biological Chemistry | VOL. 274
Shigeo Kitayama, et. al.Shigeo Kitayama ... Toshihiro Dohi
01 Apr 1999
Journal of Biological Chemistry | VOL. 274

U1 RNA-protein complex preferentially binds to both 5' and 3' splice junction sequences in RNA or single-stranded DNA.
K Tatei ... A Ishihama
Proceedings of the National Academy of Sciences of the United States of America | VOL. 81
K Tatei, et. al.K Tatei ... A Ishihama
01 Oct 1984
Proceedings of the National Academy of Sciences of the United States of America | VOL. 81

Hypomethylating Agents Do Not Alter Novel Splicing Events in Myeloid Neoplasms
Hussein A Abbas ... Guillermo Garcia-Manero
Blood | VOL. 136
Hussein A Abbas, et. al.Hussein A Abbas ... Guillermo Garcia-Manero
05 Nov 2020
Blood | VOL. 136

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular &amp; Cellular Proteomics

More From: Molecular & Cellular Proteomics