Abstract

Molecular networking has emerged as a standard approach for natural product (NP) discovery. However, the current pipeline based on molecular networks tends to prioritize larger clusters comprising multiple nodes. To address this issue, we present the integrated molecular networking workflow for NP dereplication (IMN4NPD). This approach not only expedites the rapid dereplication of extensive clusters within the molecular network but also places specific emphasis on self-looped or pairs of nodes, which are often overlooked by the current methods. By amalgamating the outputs from various computational tools, we efficiently dereplicate compounds falling into specific categories and provide annotations for both large cluster nodes and self-looped or pair of nodes within the molecular network. Furthermore, we have incorporated several fundamentally distinct similarity algorithms, namely, Spec2Vec and MS2DeepScore, for constructing the t-SNE network. Through comparison with modified cosine similarity, we have observed that integrating additional diverse spectral similarity measures, the resulting t-SNE network enhanced the ability to dereplicate NPs. Demonstrating the use case of an ethanol extract of Plumula nelumbinis, we illustrate that an integration of multiple computational solutions with IMN4NPD aids the dereplication, especially self-looped nodes, and in the discovery of novel compounds in NPs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call