Abstract
Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-Gapped-Dictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-Gapped-Dictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches.
Highlights
From the ‡Department of Electrical and Computer Engineering, University of California, San Diego, CA; §Department of Computer Science and Engineering, University of California, San Diego, CA
Shewanella Data Set—To benchmark the performance of MS-GappedDictionary, we adopted the Shewanella data set composed of 18,468 charge two spectra from Shewanella oneidensis MR-1, each representing a distinct tryptic peptide [22]. ( this paper focuses on doubly charged spectra, the same generating function approach works for spectra with higher charges as shown in [25].) The spectra in this data set were acquired on an ion trap MS (LCQ, ThermoFinnigan, San Jose, CA) using ESI and were identified with InsPecT 197 MS-GeneratingFunction [18, 21] to ensure that all Peptide Spectrum Matches (PSMs) have spectral probabilities below 10–9
Future work will focus on efficient matching of gapped peptides against large databases, we show how gapped tags can be generated from gapped peptides to effectively filter indexed databases
Summary
Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra*□S. Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be generated for any peptide length addressing the limitation of the Spectral Dictionary approach. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. The size of the Spectral Dictionary for a typical 15-aa long peptide may exceed a billion peptides making it too large for a MS/MS database search.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.