Abstract
Top-down proteomics studies intact proteins, enabling new opportunities for analyzing post-translational modifications. Because tandem mass spectra of intact proteins are very complex, spectral deconvolution (grouping peaks into isotopomer envelopes) is a key initial stage for their interpretation. In such spectra, isotopomer envelopes of different protein fragments span overlapping regions on the m/z axis and even share spectral peaks. This raises both pattern recognition and combinatorial challenges for spectral deconvolution. We present MS-Deconv, a combinatorial algorithm for spectral deconvolution. The algorithm first generates a large set of candidate isotopomer envelopes for a spectrum, then represents the spectrum as a graph, and finally selects its highest scoring subset of envelopes as a heaviest path in the graph. In contrast with other approaches, the algorithm scores sets of envelopes rather than individual envelopes. We demonstrate that MS-Deconv improves on Thrash and Xtract in the number of correctly recovered monoisotopic masses and speed. We applied MS-Deconv to a large set of top-down spectra from Yersinia rohdei (with a still unsequenced genome) and further matched them against the protein database of related and sequenced bacterium Yersinia enterocolitica. MS-Deconv is available at http://proteomics.ucsd.edu/Software.html.
Highlights
Top-down proteomics studies intact proteins, enabling new opportunities for analyzing post-translational modifications
Approach where proteins are first digested into peptides and a peptide mixture is analyzed by mass spectrometry, the top-down approach analyzes intact proteins
Because of the existence of natural isotopes, fragment ions of the same chemical formula and charge state are usually represented by a collection of spectral peaks in tandem mass spectra called an isotopomer envelope
Summary
A valid envelope of the pattern only allows at most one unmatched peak and requires it to have as least max{3, n Ϫ 3} consecutive matched peaks Using these constraints, most noise envelopes are removed from the candidate envelope list. We describe an algorithm for finding a subset of (mutually independent) envelopes with the maximum score from a set of n candidate envelopes. We redefine the scoring function for the case when selected envelopes are allowed to share peaks. To extract the monoisotopic masses of the selected envelopes, we define a distance between a theoretical isotopic distribution and an experimental envelope This function is used to address the notoriously difficult problem of correcting Ϯ1-Da errors in the list of monoisotopic masses (see the supplemental material for details)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have