Abstract

The proteins in blood were all first expressed as mRNAs from genes within cells. There are databases of human proteins that are known to be expressed as mRNA in human cells and tissues. Proteins identified from human blood by the correlation of mass spectra that fail to match human mRNA expression products may not be correct. We compared the proteins identified in human blood by mass spectrometry by 10 different groups by correlation to human and nonhuman nucleic acid sequences. We determined whether the peptides or proteins identified by the different groups mapped to the human known proteins of the Reference Sequence (RefSeq) database. We used Structured Query Language data base searches of the peptide sequences correlated to tandem mass spectrometry spectra and basic local alignment search tool analysis of the identified full length proteins to control for correlation to the wrong peptide sequence or the existence of the same or very similar peptide sequence shared by more than one protein. Mass spectra were correlated against large protein data bases that contain many sequences that may not be expressed in human beings yet the search returned a very high percentage of peptides or proteins that are known to be found in humans. Only about 5% of proteins mapped to hypothetical sequences, which is in agreement with the reported false-positive rate of searching algorithms conditions. The results were highly enriched in secreted and soluble proteins and diminished in insoluble or membrane proteins. Most of the proteins identified were relatively short and showed a similar size distribution compared to the RefSeq database. At least three groups agree on a nonredundant set of 1671 types of proteins and a nonredundant set of 3151 proteins were identified by at least three peptides.

Highlights

  • High-throughput tandem mass spectrometry (MS/MS) based peptide correlation analysis of complex biological samples yields long lists of proteins (1)

  • The groups have used different sample preparation methods including Liquid chromatography (LC)-polyacrylamide gel electrophoresis (PAGE) (3), LC/LC-MS/MS (5,8–10), iso-electric focusing (11), no sample preparation followed by ultra high-performance liquid chromatography (HPLC) (9), with fractionation followed by ultra HPLC (12), and after low molecular mass filtration (13)

  • The peptide sequences were used to create a nonredundant set of 2704 Reference Sequence (RefSeq) proteins identified by Shen et al (12) that still contained an exact match to the peptides from

Read more

Summary

Introduction

High-throughput tandem mass spectrometry (MS/MS) based peptide correlation analysis of complex biological samples yields long lists of proteins (1). Serum may contain most human proteins (2) but most are not detectable by chromatography followed by polyacrylamide gel electrophoresis (PAGE) (3). Some of the peptides from apparently low abundance proteins have reportedly resulted from nontryptic activities or contained missed cleavage sites. We obtained the published protein expression lists generated by 10 research groups from the MS analysis of human blood. Most proteins have been identified by MS/MS with collision-induced dissociation of tryptic disgests from blood proteins. These fragmentation spectra were correlated to as many proteins as possible from a comprehensive protein databases (8). A number of correlation algorithms have been developed and tested empirically to determine the scoring parameters that result in acceptable false-positive rates of about 5% (14,15–19) based on searches of both real and nonphysiological protein databases (6,8,17), but this alone may not be sufficient to ensure correct identification (7)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call