Abstract

Algorithmic search engines bridge the gap between large tandem mass spectrometry data sets and the identification of proteins associated with biological samples. Improvements in these tools can greatly enhance biological discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral Profile-based Intensity Comparison) scorer converts an experimental tandem mass spectrum into a m/z profile of probability and then scores peak lists from potential candidate peptides using a multinomial distribution model. The MASPIC scoring scheme incorporates intensity, spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial distribution. The scoring scheme was validated on two standard protein mixtures and an additional set of spectra collected on a complex ribosomal protein mixture from Rhodopseudomonas palustris. The results indicate a 5-15% improvement over Sequest for high-confidence identifications. The performance gap grows as sequence database size increases. Additional tests on spectra from proteinase-K digest data showed similar performance improvements demonstrating the advantages in using MASPIC for studying proteins digested with less specific proteases. All these investigations show MASPIC to be a versatile and reliable system for peptide tandem mass spectral identification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call