Abstract

Correct identification of peptides and proteins in complex biological samples from proteomic mass-spectra is a challenging problem in bioinformatics. The sensitivity and specificity of identification algorithms depend on underlying scoring methods, some being more sensitive, and others more specific. For high-throughput, automated peptide identification, control over the algorithms' performance in terms of trade-off between sensitivity and specificity is desirable. Combinations of algorithms, called 'consensus methods', have been shown to provide more accurate results than individual algorithms. However, due to the proliferation of algorithms and their varied internal settings, a systematic understanding of relative performance of individual and consensus methods are lacking. We performed an in-depth analysis of various approaches to consensus scoring using known protein mixtures, and evaluated the performance of 2310 settings generated from consensus of three different search algorithms: Mascot, Sequest, and X!Tandem. Our findings indicate that the union of Mascot, Sequest, and X!Tandem performed well (considering overall accuracy), and methods using 80-99.9% protein probability and/or minimum 2 peptides and/or 0-50% minimum peptide probability for protein identification performed better (on average) among all consensus methods tested in terms of overall accuracy. The results also suggest method selection strategies to provide direct control over sensitivity and specificity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call