Abstract

BackgroundIn a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets.ResultsThis study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra.ConclusionsResults indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.

Highlights

  • In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra

  • The results demonstrate that the sets with a small number of features outperforms the full set of features, which indicates that these features together can better describe the quality of tandem mass spectra and improve the performance of tandem mass spectral quality assessment

  • Conclusions and future work This paper has presented an un-supervised machine learning method to integrate the assessments based on individual features into a consensus assessment with a higher precision

Read more

Summary

Introduction

In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. Majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. The quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets. One area in proteomics is to identify proteins in biological complexes via peptides identified from tandem mass spectra. It is worthwhile to develop an automatic quality assessment algorithm to discriminate high-quality from poor-quality spectra before further interpretation

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.