Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is a potent analytical technique utilized for identifying natural products from complex sources. However, due to the structural diversity, annotating LC-MS/MS data of natural products efficiently remains challenging, hindering the discovery process of novel active structures. Here, we introduce MassKG, an algorithm that combines a knowledge-based fragmentation strategy and a deep learning-based molecule generation model to aid in rapid dereplication and the discovery of novel NP structures. Specifically, MassKG has compiled 407,720 known NP structures and, based on this, generated 266,353 new structures using chemical language models for the discovery of potential novel compounds. Furthermore, MassKG demonstrates exceptional performance in spectra annotation compared to state-of-the-art algorithms. To enhance usability, MassKG has been implemented as a web server for annotating tandem mass spectral data (MS/MS, MS2) with a user-friendly interface, automatic reporting, and fragment tree visualization. Lastly, the interpretive capability of MassKG is comprehensively validated through composition analysis and MS annotation of Panax notoginseng, Ginkgo Biloba, Codonopsis pilosula, and Astragalus membranaceus. MassKG is now accessible at https://xomics.com.cn/masskg.
Read full abstract