The aim of deconvolution of top-down mass spectra is to recognize monoisotopic peaks from the experimental envelopes in raw mass spectra. So accurate assessment of similarity between theoretical and experimental envelopes is a critical step in mass spectra data deconvolution. Existing evaluation methods primarily rely on intensity differences and m/z similarity, potentially lacking a comprehensive assessment. To overcome this constraint and facilitate a comprehensive and refined assessment of the similarity between theoretical and experimental envelopes, there exists an imperative to systematically explore and identify increasingly efficacious features for assessing this correspondence. We present enhanced feature representation for isotopic envelope evaluation (FREE) that derives diverse feature representations, encapsulating fundamental physical attributes of envelopes, including peak intensity and envelope shape. We trained FREE and evaluated its performance on both the ovarian tumor (OT) (human OT cells) data set and zebrafish (ZF) (brain in mature female ZF) data set. Specifically, comparing the state-of-art method, FREE demonstrates higher performance in multiple evaluation metrics across both the OT and ZF data sets, with a particular emphasis on precision, and it demonstrates accurate predictions of a greater number of positive envelopes among the top-ranked envelopes based on their scores. Moreover, within a cross-species data set of ZF, FREE identified a higher number of proteoform-spectrum matches (PrSMs), increasing the count from 50,795 to 52,927 compared to EnvCNN, the amalgamation of FREE with TopFD also exhibits a commendable capacity to discern 117,883 fragment ions, thus surpassing the 97,554 fragment ions identified through the application of EnvCNN in conjunction with TopFD. To further validate the performance of FREE, we have tested 10 a cross-species top-down proteomes containing 36 subdata set from ProteomeXchange. The results reveal that, after deconvolution with TopFD + FREE, TopPIC identifies more PrSMs across these 10 data sets in both the first and second rounds of experiments. These findings underscore the robustness and generalization capabilities of the FREE approach in diverse proteomes.
Read full abstract