Abstract

Among the 20 000 human gene products predicted from genome annotation, about 3000 still lack validation at protein level. We developed PepPSy, a user-friendly gene expression-based prioritization system, to help investigators to determine in which human tissues they should look for an unseen protein. PepPSy can also be used by biocurators to revisit the annotation of specific categories of proteins based on the ‘omics’ data housed by the system. In this study, it was used to prioritize 21 dubious protein-coding genes among the 616 annotated in neXtProt for reannotation. PepPSy is freely available at http://peppsy.genouest.org.Database URL: http://peppsy.genouest.org.

Highlights

  • Analysis of the human genome led to the identification of approximately 20 000 protein-coding genes, which produce a variety of functional proteoforms via different mechanisms including genetic polymorphisms, alternative splicing, post-translational modifications, or processing

  • The Human Proteome Project (HPP) launched by the Human Proteome Organization (HUPO) aims at providing experimental validation for these proteoforms and understanding their role in health and disease [1]. neXtProt is an innovative knowledge platform focusing on human proteins, that is built on top of UniProtKB/Swiss-Prot annotations [2] and provides additional expert-curated information on protein expression, subcellular localization, post-translational modifications and protein variations, gathered from selected high-throughput datasets [3]

  • Because cDNAs for FAM71E2 have been found in different tissues, the status of the entry has been changed to PE2 in UniProtKB (04MAR-2015 release)

Read more

Summary

Introduction

Analysis of the human genome led to the identification of approximately 20 000 protein-coding genes, which produce a variety of functional proteoforms via different mechanisms including genetic polymorphisms, alternative splicing, post-translational modifications, or processing. The result is displayed in the form of a table (Figure 2A) containing one protein entry per line with columns for: neXtProt IDs (hyperlinked to the neXtProt knowledgebase); gene symbols (hyperlinked to the HGNC database [24]) and protein descriptions; NCBI Entrez gene IDs (hyperlinked to the NCBI website); the color-coded evolution of the neXtProt PE status over time; the current PE status; the observability status according to the classification published by Farrah et al [18]; the rank of each neXtProt entry computed by the PepPSy prioritization system based on the weight scheme defined by the user; and the human tissues in which the corresponding gene products display the highest abundance based on the six distinct transcriptomic and proteomic datasets (NCBIUniGene, Affymetrix 30 array and All Exon, Illumina RNA sequencing, HPA antibody-based and HPM LC/MS-based protein expression profiles) (Figure 2A).

Design and implementation
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.