Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy.

Sean J Mcilwain,Zhijie Wu,Molly Wetzel,Yutong Jin,Ying Ge,Kent Wenger,Daniel Belongia,Irene M Ong

doi:10.1021/jasms.0c00035

Abstract

Top-down mass spectrometry (MS) is a powerful tool for the identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. However, the complex data set generated from top-down MS experiments requires multiple sequential data processing steps to successfully interpret the data for identifying and characterizing proteoforms. One critical step is the deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes. Multiple algorithms are currently available to deconvolute top-down mass spectra, resulting in different deconvoluted peak lists with varied accuracy compared to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. For the random forest algorithm, which had better predictive performance, the consensus peak lists on average could achieve a recall value (true positive rate) of 0.60 and a precision value (positive predictive value) of 0.78. It outperforms the single best algorithm, which achieved a recall value of only 0.47 and a precision value of 0.58. This machine learning strategy enhanced the accuracy and confidence in protein identification during database searches by accelerating the detection of true positive peaks while filtering out false positive peaks. Thus, this method shows promise in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy.

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Mass Spectrometry

Lead the way for us

Journal: Journal of the American Society for Mass Spectrometry	Publication Date: Mar 30, 2020
Citations: 19

Similar Papers

Shining a spotlight on intact proteins
...
PROTEOMICS | VOL. 14
, et. al. ...
01 May 2014
PROTEOMICS | VOL. 14

MASH Suite Pro: A Comprehensive Software Tool for Top-Down Proteomics
Wenxuan Cai ... Ying Ge
Molecular & Cellular Proteomics | VOL. 15
Wenxuan Cai, et. al.Wenxuan Cai ... Ying Ge
01 Feb 2016
Molecular & Cellular Proteomics | VOL. 15

Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins
Xiaowen Liu ... Pavel A Pevzner
Molecular & Cellular Proteomics | VOL. 9
Xiaowen Liu, et. al.Xiaowen Liu ... Pavel A Pevzner
01 Dec 2010
Molecular & Cellular Proteomics | VOL. 9

Seeing the complete picture: proteins in top-down mass spectrometry.
Frederik Lermyte ... Tanja Habeck
Essays in Biochemistry | VOL. 67
Frederik Lermyte, et. al.Frederik Lermyte ... Tanja Habeck
29 Mar 2023
Essays in Biochemistry | VOL. 67

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy.

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Mass Spectrometry