Molecular weight (MW) is a crucial property to improve the accuracy of multidimensional compound identification. In this study, we have developed MWFormer, a novel method that predicts MWs solely from spectra of electron ionization mass spectrometry (EI-MS) based on a Transformer encoder. MWFormer achieves a mean absolute error (MAE) of 6.38 Da, which is only one-sixth of the MAE by the peak interpretation method (PIM) on the test set. The MWFormer-predicted MW with superior accuracy can be used to eliminate false positive molecules in multidimensional compound identification. The results show that the MW filter improves the recall@3 metric by nearly 4% points compared with solely spectrum matching results. Moreover, MWFormer can be combined with retention indices (RIs) to achieve GC-EI-MS 3D compound identification to improve the recall@3 metric by nearly 7% points, compared with the results of spectrum matching alone. Besides, a user-friendly web service is provided to predict MWs in single mode or batch mode. All code, data, and models are available at https://github.com/zhanghailiangcsu/MWFormer.
Read full abstract