Abstract

The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications.

Highlights

  • The most common method of analyzing proteomics data involves searching against protein sequence databases (Uniprot [1], NCBI GenBank [2], etc.) that accumulate protein and genetic information from numerous experimental sources [1]

  • Created mixture spectra served as an initial assessment of the effect of mixture spectra on de novo sequencing (see Extended Materials and Methods (Supporting Information) for preparation details)

  • Using artificial mixture spectra with various mixture ratios, we demonstrated that the true identification rate was affected by the presence of coisolated fragments in the mass spectrum, for high mixture ratios

Read more

Summary

Introduction

The most common method of analyzing proteomics data involves searching against protein sequence databases (Uniprot [1], NCBI GenBank [2], etc.) that accumulate protein and genetic information from numerous experimental sources [1]. Only a portion of a known database sequence is used, e.g. the sequence is limited to certain biological species and protein modifications; some errors are possible

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call