Abstract

Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.

Highlights

  • Database search tools identify peptides by matching tandem mass spectra against a protein database

  • Similar to generating the covering set of tags, one can try to generate the covering sets of full-length peptide reconstructions that with high probability contain the correct peptide

  • We found good correlation between the MS-Dictionary scoring function and the scoring functions used in the database search tools; the correlation coefficients are 0.87 for SEQUEST, 0.90 for X!Tandem, and 0.96 for InsPecT (Fig. 3)

Read more

Summary

Introduction

Database search tools identify peptides by matching tandem mass spectra against a protein database. In 1994, Mann and Wilm [1] proposed the peptide sequence tag approach and outlined its applications for protein identification It took 10 years for this approach to result in accurate tag-based tools like InsPecT [2] and Paragon [3], currently among the fastest MS/MS database search tools. The reason for this delay is that generating some peptide sequence tags is easy, such tags are of little use unless they contain at least one correct tag with high probability. We describe a fast approach to generating spectral dictionaries that takes Ϸ0.1 s per spectrum and benchmark it on a data set of over 20,000 peptides

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.