Abstract

Top-down proteomics studies intact proteins, enabling new opportunities for analyzing post-translational modifications. Because tandem mass spectra of intact proteins are very complex, spectral deconvolution (grouping peaks into isotopomer envelopes) is a key initial stage for their interpretation. In such spectra, isotopomer envelopes of different protein fragments span overlapping regions on the m/z axis and even share spectral peaks. This raises both pattern recognition and combinatorial challenges for spectral deconvolution. We present MS-Deconv, a combinatorial algorithm for spectral deconvolution. The algorithm first generates a large set of candidate isotopomer envelopes for a spectrum, then represents the spectrum as a graph, and finally selects its highest scoring subset of envelopes as a heaviest path in the graph. In contrast with other approaches, the algorithm scores sets of envelopes rather than individual envelopes. We demonstrate that MS-Deconv improves on Thrash and Xtract in the number of correctly recovered monoisotopic masses and speed. We applied MS-Deconv to a large set of top-down spectra from Yersinia rohdei (with a still unsequenced genome) and further matched them against the protein database of related and sequenced bacterium Yersinia enterocolitica. MS-Deconv is available at http://proteomics.ucsd.edu/Software.html.

Highlights

  • Top-down proteomics studies intact proteins, enabling new opportunities for analyzing post-translational modifications

  • Approach where proteins are first digested into peptides and a peptide mixture is analyzed by mass spectrometry, the top-down approach analyzes intact proteins

  • Because of the existence of natural isotopes, fragment ions of the same chemical formula and charge state are usually represented by a collection of spectral peaks in tandem mass spectra called an isotopomer envelope

Read more

Summary

EXPERIMENTAL PROCEDURES

A valid envelope of the pattern only allows at most one unmatched peak and requires it to have as least max{3, n Ϫ 3} consecutive matched peaks Using these constraints, most noise envelopes are removed from the candidate envelope list. We describe an algorithm for finding a subset of (mutually independent) envelopes with the maximum score from a set of n candidate envelopes. We redefine the scoring function for the case when selected envelopes are allowed to share peaks. To extract the monoisotopic masses of the selected envelopes, we define a distance between a theoretical isotopic distribution and an experimental envelope This function is used to address the notoriously difficult problem of correcting Ϯ1-Da errors in the list of monoisotopic masses (see the supplemental material for details)

RESULTS AND DISCUSSION
No output masses
NME and methylation
Possible PTM
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call