Abstract
One of the major challenges in proteomics is peptide identification from mass spectra containing high noise ratio and small number of signal (b-/y-ions) peaks. However, the accuracy and reliability of peptide identification in such highly imbalanced MS/MS data can be improved by applying a preprocessing step prior to peptide identification aiming at discriminating b-/y-ions from noise peaks in the spectra. In this study, we report a genetic programming (GP)-based preprocessing method for de-noising highly imbalanced and noisy CID MS/MS spectra. GP now becomes a popular machine learning method via automatic programming. GP preprocesses the highly noisy MS/MS spectra by classifying peaks as noise peaks or signal peaks in a binary classification manner. Meanwhile, a set of spectral fragment features based on the MS/MS fragmentation rules is extracted from the dataset to investigate their discriminating abilities by GP. A MS/MS spectral dataset containing thousands of spectra are used to train the GP model. As the GP tree-based representation has the capability for implicit feature selection during the evolutionary process, the evolved GP model with the selected features is compared with the best threshold-based method. The results show that the GP method improved the reliability of peptide identification and increased the identification rate of a de novo sequencing tool, PEAKS, to 99.4% from 80.1% achieved by the best threshold-based method. Moreover, the result of peptide identification by a database search tool, SEQUEST, using the data preprocessed by the GP method was statistically significant compared to the other methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of the American Society for Mass Spectrometry
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.