Abstract

This work discusses bioinformatics and experimental approaches to explore the human proteome, a constellation of proteins expressed in different tissues and organs. As the human proteome is not a static entity, it seems necessary to estimate the number of different protein species (proteoforms) and measure the number of copies of the same protein in a specific tissue. Here, meta-analysis of neXtProt knowledge base is proposed for theoretical prediction of the number of different proteoforms that arise from alternative splicing (AS), single amino acid polymorphisms (SAPs), and posttranslational modifications (PTMs). Three possible cases are considered: (1) PTMs and SAPs appear exclusively in the canonical sequences of proteins, but not in splice variants; (2) PTMs and SAPs can occur in both proteins encoded by canonical sequences and in splice variants; (3) all modification types (AS, SAP, and PTM) occur as independent events. Experimental validation of proteoforms is limited by the analytical sensitivity of proteomic technology. A bell-shaped distribution histogram was generated for proteins encoded by a single chromosome, with the estimation of copy numbers in plasma, liver, and HepG2 cell line. The proposed metabioinformatics approaches can be used for estimation of the number of different proteoforms for any group of protein-coding genes.

Highlights

  • Genome sequencing [1] deciphered the number of proteincoding genes, establishing an initial estimation of complexity associated with human molecular biology

  • Taking into account products of alternative splicing (AS), those containing single amino acid polymorphisms (SAPs) arising from nonsynonymous singlenucleotide polymorphisms, and those that undergo posttranslational modifications (PTMs) [4, 5], as many as 100 different proteins can potentially be produced from a single gene

  • We proposed that the volume of representative data uploaded to UniProt [12] each year from 2005 was sufficient to calculate the average number of protein variants per one gene and the numbers for each type of variation

Read more

Summary

From Human Genome to Human Proteome

Genome sequencing [1] deciphered the number of proteincoding genes, establishing an initial estimation of complexity associated with human molecular biology. Experimental validation of protein species is limited by the analytical sensitivity of proteomic technology. This means that the sensitivity of the technology determines the ability to detect rare protein species. This limitation originates from the basic difference between genomics and proteomics [8]. The 100% coverage of protein sequence using bottomup MS is not attainable; it is impossible to detect all potential protein species expressed from the same gene. The bioinformatic analysis of the diversity of protein species was anticipated to create the backbone for the future experimental exploration of the proteome space

How Many Different Proteins Are Necessary to Support Human Function?
Findings
How Many Protein Species Are Detectable Today?
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call