Abstract

Human blood plasma can be obtained relatively noninvasively and contains proteins from most, if not all, tissues of the body. Therefore, an extensive, quantitative catalog of plasma proteins is an important starting point for the discovery of disease biomarkers. In 2005, we showed that different proteomics measurements using different sample preparation and analysis techniques identify significantly different sets of proteins, and that a comprehensive plasma proteome can be compiled only by combining data from many different experiments. Applying advanced computational methods developed for the analysis and integration of very large and diverse data sets generated by tandem MS measurements of tryptic peptides, we have now compiled a high-confidence human plasma proteome reference set with well over twice the identified proteins of previous high-confidence sets. It includes a hierarchy of protein identifications at different levels of redundancy following a clearly defined scheme, which we propose as a standard that can be applied to any proteomics data set to facilitate cross-proteome analyses. Further, to aid in development of blood-based diagnostics using techniques such as selected reaction monitoring, we provide a rough estimate of protein concentrations using spectral counting. We identified 20,433 distinct peptides, from which we inferred a highly nonredundant set of 1929 protein sequences at a false discovery rate of 1%. We have made this resource available via PeptideAtlas, a large, multiorganism, publicly accessible compendium of peptides identified in tandem MS experiments conducted by laboratories around the world.

Highlights

  • From the ‡Institute for Systems Biology, Seattle, WA; §Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; ¶Department of Medicine, University of Southern California, Los Angeles, CA; ʈDepartment of Biology, Institute of Molecular Systems Biology, ETH (Swiss Federal Institute of Technology), Zurich, Switzerland; ʈʈSwedish Neuroscience Institute, Swedish Medical Center, Seattle, WA; ‡‡Systems Biology Division, Zhejiang-California International Nanosystems Institute (ZCNI), Zhejiang University, Hangzhou, Zhejiang, China; §§Department of Pathology, Johns Hopkins University, Baltimore, MD

  • Confidence, and Completeness of Proteome Reference Set—The 2010 Human Plasma PeptideAtlas, constructed from 91 LC-MS/MS data sets, contains 1929 canonical protein sequences with an estimated protein false discovery rate (FDR) of Ͻ0.98% (Fig. 4 and supplemental Table S3)

  • PeptideAtlas is an integral part of the ProteomeXchange infrastructure for Human Proteome Organization (HUPO) initiatives and other worldwide data submissions (figure published in (65)), together with the ProteomeCommons.org Tranche distributed file-sharing system (66) and the EBI PRIDE (67) database

Read more

Summary

Introduction

Eighteen laboratories contributed tandem MS findings and protein identifications, which were integrated by a collaborative process into a core data set of 3020 proteins from the International Protein Index (IPI) database (3) containing two or more identified peptides, plus filters for smaller, higher confidence lists (4, 5). Chan et al reported 1444 unique proteins in serum using a multidimensional peptide separation strategy (12), of which 1019 mapped to IPI and 257 to the PPP-I core data set. These previous efforts highlight the challenges associated with accurately determining the number of proteins inferred from large proteomic data sets, and with comparing the proteins identified in different data sets

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.