An improved machine learning protocol for the identification of correct Sequest search results

Morten Källberg,Hui Lu

doi:10.1186/1471-2105-11-591

Abstract

BackgroundMass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols.ResultsThe developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the 'black-box' notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested.ConclusionsWe demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.

Highlights

Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized
The analysis of composite protein mixtures by use of mass spectrometry techniques has become a standard methodology for characterizing the proteomic profile of a cell or tissue sample [1]
We present the details of each step and describe a method for extending the developed protocol into a probabilistic protein identification method

Summary

Introduction

Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. The analysis of composite protein mixtures by use of mass spectrometry techniques has become a standard methodology for characterizing the proteomic profile of a cell or tissue sample [1]. Efficient use of the MS/MS technique [8] in large scale protein characterization studies requires robust and consistent data analysis procedures. To this end, the combination of spectral data and the vast amount of genomic. To ensure an effective production pipeline, a fully automated method for confident validation of the results produced by the above mentioned search algorithms is essential

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2010
Citations: 42	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

An improved machine learning protocol for the identification of correct Sequest search results

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A Machine Learning Approach with Human-AI Collaboration for Automated Classification of Patient Safety Event Reports: Algorithm Development and Validation Study.
Hongbo Chen ... Eldan Cohen
JMIR Human Factors | VOL. 11
Hongbo Chen, et. al.Hongbo Chen ... Eldan Cohen
25 Jan 2024
JMIR Human Factors | VOL. 11

Automated Brain Tumour Detection and Classification using Deep Features and Bayesian Optimised Classifiers.
S Arun Kumar ... S Sasikala
Current medical imaging | VOL. 20
S Arun Kumar, et. al.S Arun Kumar ... S Sasikala
10 Jul 2023
Current medical imaging | VOL. 20

Answering Gene Ontology terms to proteomics questions by supervised macro reading in Medline
Julien Gobeill ... Emilie Pasche
EMBnet.journal | VOL. 18
Julien Gobeill, et. al.Julien Gobeill ... Emilie Pasche
09 Nov 2012
EMBnet.journal | VOL. 18

Development of Efficient Classification Systems for the Diagnosis of Melanoma
S Palpandi ... T Meeradevi
Computer Systems Science and Engineering | VOL. 42
S Palpandi, et. al.S Palpandi ... T Meeradevi
01 Jan 2021
Computer Systems Science and Engineering | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An improved machine learning protocol for the identification of correct Sequest search results

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics