An unsupervised machine learning method for assessing quality of tandem mass spectra

Wenjun Lin,Jianxin Wang,Fang-Xiang Wu,Wen-Jun Zhang

doi:10.1186/1477-5956-10-s1-s12

Wenjun Lin, Jianxin Wang + Show 2 more

Open Access

https://doi.org/10.1186/1477-5956-10-s1-s12

Copy DOI

Abstract

BackgroundIn a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets.ResultsThis study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra.ConclusionsResults indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.

Highlights

In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra
The results demonstrate that the sets with a small number of features outperforms the full set of features, which indicates that these features together can better describe the quality of tandem mass spectra and improve the performance of tandem mass spectral quality assessment
Conclusions and future work This paper has presented an un-supervised machine learning method to integrate the assessments based on individual features into a consensus assessment with a higher precision

Summary

Introduction

In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. Majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. The quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets. One area in proteomics is to identify proteins in biological complexes via peptides identified from tandem mass spectra. It is worthwhile to develop an automatic quality assessment algorithm to discriminate high-quality from poor-quality spectra before further interpretation

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proteome Science	Publication Date: Jun 21, 2012
Citations: 21	License type: cc-by

R Discovery Prime

R Discovery Prime

An unsupervised machine learning method for assessing quality of tandem mass spectra

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proteome Science

Lead the way for us

Similar Papers

Quality assessment of peptide tandem mass spectra
Fang-Xiang Wu ... Arnaud Droit
BMC Bioinformatics | VOL. 9
Fang-Xiang Wu, et. al.Fang-Xiang Wu ... Arnaud Droit
01 May 2008
BMC Bioinformatics | VOL. 9

Quality Assessment of Peptide Tandem Mass Spectra
Fang-Xiang Wu ... Guy G Poirier
-
Fang-Xiang Wu, et. al.Fang-Xiang Wu ... Guy G Poirier
01 Jun 2006
01 Jun 2006

New uses for tandem mass spectrometry
Wolf D Lehmann
Trends in Biotechnology | VOL. 19
Wolf D LehmannWolf D Lehmann
09 May 2001
Trends in Biotechnology | VOL. 19

Preprocessing of Tandem Mass Spectrometric Data Based on Decision Tree Classification
Jing-Fen Zhang ... Wen Gao
Genomics, Proteomics & Bioinformatics | VOL. 3
Jing-Fen Zhang, et. al.Jing-Fen Zhang ... Wen Gao
01 Dec 2005
Genomics, Proteomics & Bioinformatics | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An unsupervised machine learning method for assessing quality of tandem mass spectra

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proteome Science