Abstract

The principle of shotgun proteomics is to use peptide mass spectra in order to identify corresponding sequences in a protein database. The quality of peptide and protein identification and quantification critically depends on the sensitivity and specificity of this assignment process. Many peptides in proteomic samples carry biochemical modifications, and a large fraction of unassigned spectra arise from modified peptides. Spectra derived from modified peptides can erroneously be assigned to wrong amino acid sequences. However, the impact of this problem on proteomic data has not yet been investigated systematically. Here we use combinations of different database searches to show that modified peptides can be responsible for 20-50% of false positive identifications in deep proteomic data sets. These false positive hits are particularly problematic as they have significantly higher scores and higher intensities than other false positive matches. Furthermore, these wrong peptide assignments lead to hundreds of false protein identifications and systematic biases in protein quantification. We devise a "cleaned search" strategy to address this problem and show that this considerably improves the sensitivity and specificity of proteomic data. In summary, we show that modified peptides cause systematic errors in peptide and protein identification and quantification and should therefore be considered to further improve the quality of proteomic data annotation.

Highlights

  • The principle of shotgun proteomics is to use peptide mass spectra in order to identify corresponding sequences in a protein database

  • We use combinations of different database searches to show that modified peptides can be responsible for 20 –50% of false positive identifications in deep proteomic data sets

  • Mass spectrometry has matured to a level where it is able to assess the complexity of the human proteome [1]

Read more

Summary

Introduction

The principle of shotgun proteomics is to use peptide mass spectra in order to identify corresponding sequences in a protein database. We use combinations of different database searches to show that modified peptides can be responsible for 20 –50% of false positive identifications in deep proteomic data sets. We identify modified peptides as a systematic source of biases in protein identification and quantification in deep proteomic data sets and outline a strategy to minimize type I errors caused by modified peptides.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call