Abstract

SUMMARYThe protein-level translational status and function of many alternative splicing events remain poorly understood. We use an RNA sequencing (RNA-seq)-guided proteomics method to identify protein alternative splicing isoforms in the human proteome by constructing tissue-specific protein databases that prioritize transcript splice junction pairs with high translational potential. Using the custom databases to reanalyze ~80 million mass spectra in public proteomics datasets, we identify more than 1,500 noncanonical protein isoforms across 12 human tissues, including ~400 sequences undocumented on TrEMBL and RefSeq databases. We apply the method to original quantitative mass spectrometry experiments and observe widespread isoform regulation during human induced pluripotent stem cell cardiomyocyte differentiation. On a proteome scale, alternative isoform regions overlap frequently with disordered sequences and post-translational modification sites, suggesting that alternative splicing may regulate protein function through modulating intrinsically disordered regions. The described approach may help elucidate functional consequences of alternative splicing and expand the scope of proteomics investigations in various systems.

Highlights

  • Protein species outnumber coding genes in eukaryotes, in part, because one gene can encode multiple transcripts through alternative splicing (AS) (Aebersold et al, 2018; Smith and Kelleher, 2018)

  • Generation of Junction-Centric Protein Sequence Databases We assembled a computational workflow to translate AS junctions to protein sequences in silico (Figure 1A)

  • We retrieved ENCODE RNA sequencing (RNA-seq) data on the GTEx tissue collection of human heart, lungs, liver, pancreas, transverse colon, ovary, testis, prostate, spleen, thyroid, esophagus, and adrenal gland, each containing 101 nt paired-end (PE) total RNA-seq data with 2 biological replicates passing ENCODE consortium-wide quality control

Read more

Summary

Introduction

Protein species outnumber coding genes in eukaryotes, in part, because one gene can encode multiple transcripts through alternative splicing (AS) (Aebersold et al, 2018; Smith and Kelleher, 2018). RNA-seq experiments have discovered over 100,000 AS transcripts in the human genome (Pan et al, 2008; Wang et al, 2008), but identifying which AS isoforms are functionally important is a major unmet goal, and critically, most have never been detected at the protein level. Larger sequence databases (e.g., TrEMBL and RefSeq) exist, but it is unclear whether the majority of deposited sequences are bona fide isoforms or gene fragments, polymorphisms, and redundant entries. Due to these limitations, the protein molecular functions of most AS events remain severely under-characterized, and a systematic picture is lacking on how AS rewires proteome functions (Tress et al, 2017a, 2017b)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call