Abstract

Integration of transcriptome data is a crucial step for the identification of rare protein variants in mass-spectrometry (MS) data with important consequences for all branches of biotechnology research. Here, we used Splooce, a database of splicing variants recently developed by us, to search MS data derived from a variety of human tumor cell lines. More than 800 new protein variants were identified whose corresponding MS spectra were specific to protein entries from Splooce. Although the types of splicing variants (exon skipping, alternative splice sites and intron retention) were found at the same frequency as in the transcriptome, we observed a large variety of modifications at the protein level induced by alternative splicing events. Surprisingly, we found that 40% of all protein modifications induced by alternative splicing led to the use of alternative translation initiation sites. Other modifications include frameshifts in the open reading frame and inclusion or deletion of peptide sequences. To make the dataset generated here available to the community in a more effective form, the Splooce portal (http://www.bioinformatics-brazil.org/splooce) was modified to report the alternative splicing events supported by MS data.

Highlights

  • The development of large-scale technologies, including genomics, has revolutionized life sciences

  • Identification of splicing variants in the MS/MS data Splooce was used as a source to create a database of predicted protein isoforms in FASTA format, which was searched against MS/MS spectra

  • Files from a publication that reported good level of instrument sensitivity and proteomic depth (Geiger et al, 2012) were used and the MS dataset was challenged against the Splooce-derived protein sequences using two peptide identification approaches, one based in probabilistic method and another one based on de novo sequencing (Fig. 1)

Read more

Summary

Introduction

The development of large-scale technologies, including genomics, has revolutionized life sciences. The sequencing of the human genome in 2001 was a milestone in the characterization of our genetic framework (Lander et al, 2001; Venter et al, 2001). The advancement of sequencing technologies in the last few years has allowed the genome sequencing of more than a thousand human individuals (1000 Genomes Project) (The 1000 Genomes Project Consortium, 2012). The characterization of the transcriptome was facilitated by these new sequencing technologies. RNA-Seq techniques have allowed the identification of transcripts with low copy numbers. The complete characterization of the transcriptome of different cell types is already a reality today (Au et al, 2013; Peng et al, 2012; Xue et al, 2014).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call