Abstract

One of the major goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to catalog and annotate a myriad of heterogeneous proteoforms, produced by ca. 20 thousand genes. To achieve a detailed and personalized understanding into proteomes, we suggest using a customized RNA-seq library of potential proteoforms, which includes aberrant variants specific to certain biological samples. Two-dimensional electrophoresis coupled with high-performance liquid chromatography allowed us to downgrade the difficulty of biological mixing following shotgun mass spectrometry. To benchmark the proposed pipeline, we examined heterogeneity of the HepG2 hepatoblastoma cell line proteome. Data are available via ProteomeXchange with identifier PXD018450.

Highlights

  • Genes are the origin story about an organism’s development into what it becomes

  • Custom RNA-seq data for the biological sample of the HepG2 cell line were used as the basis for an in-house personalized library of amino acid sequences, which have the potential of realization at the protein level

  • This library contains 51,836 sequences encoded by 12,148 genes (~4.3 sequences per gene): 11,347 canonical sequences, 8489 splice variants, 8741 single amino acid polymorphisms (SAP), 9822 potential proteoforms with insertions or deletions, and 13,437 sequences with single amino acid changes, insertions, and deletions in alternatively spliced transcripts (Figure 1)

Read more

Summary

Introduction

Genes are the origin story about an organism’s development into what it becomes. Three billion nucleotide letters turn into bodies with their unique phenotypes and unique molecular portraits, which come out of the individual power of the genome. A combination of postgenome technologies provides the opportunity to accelerate the understanding of molecular mechanisms of living systems. One of the most tantalizing postgenome objects to study are proteoforms [1]. Proteoforms are different protein products, encoded by one gene, which can differ dramatically from each other [2,3,4]. Alternative splicing (AS), single amino acid polymorphisms (SAP), and post-translational modifications (PTM) intrude upon the dominion of genes and crucially modulate biological processes due to the development of disease. Active attention to proteoforms made them a popular measure of postgenome objects. The deficit of information certainty about the heterogeneity of the proteome increases interest of this layer of biological information [5]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call