Abstract

BackgroundAlkaloids, a class of organic compounds that contain nitrogen bases, are mainly synthesized as secondary metabolites in plants and fungi, and they have a wide range of bioactivities. Although there are thousands of compounds in this class, few of their biosynthesis pathways are fully identified. In this study, we constructed a model to predict their precursors based on a novel kind of neural network called the molecular graph convolutional neural network. Molecular similarity is a crucial metric in the analysis of qualitative structure–activity relationships. However, it is sometimes difficult for current fingerprint representations to emphasize specific features for the target problems efficiently. It is advantageous to allow the model to select the appropriate features according to data-driven decisions for extracting more useful information, which influences a classification or regression problem substantially.ResultsIn this study, we applied a neural network architecture for undirected graph representation of molecules. By encoding a molecule as an abstract graph and applying "convolution" on the graph and training the weight of the neural network framework, the neural network can optimize feature selection for the training problem. By incorporating the effects from adjacent atoms recursively, graph convolutional neural networks can extract the features of latent atoms that represent chemical features of a molecule efficiently. In order to investigate alkaloid biosynthesis, we trained the network to distinguish the precursors of 566 alkaloids, which are almost all of the alkaloids whose biosynthesis pathways are known, and showed that the model could predict starting substances with an averaged accuracy of 97.5%.ConclusionWe have showed that our model can predict more accurately compared to the random forest and general neural network when the variables and fingerprints are not selected, while the performance is comparable when we carefully select 507 variables from 18000 dimensions of descriptors. The prediction of pathways contributes to understanding of alkaloid synthesis mechanisms and the application of graph based neural network models to similar problems in bioinformatics would therefore be beneficial. We applied our model to evaluate the precursors of biosynthesis of 12000 alkaloids found in various organisms and found power-low-like distribution.

Highlights

  • Alkaloids, a class of organic compounds that contain nitrogen bases, are mainly synthesized as secondary metabolites in plants and fungi, and they have a wide range of bioactivities

  • We propose a classification system of alkaloids according to their starting substances based on a graph convolutional neural network (GCNN), which is a model that generalizes convolution operation for abstract graph structures, instead of the operations on 1D or 2D grids of variables that are commonly used in convolutional neural networks (CNN) [14, 15]

  • Single classification in the molecular graph convolutional neural networks (MGCNN) model We evaluated the accuracy of the prediction of starting substances by changing the network size, i.e., the number of convolution stages, from one to six (Fig. 3)

Read more

Summary

Introduction

A class of organic compounds that contain nitrogen bases, are mainly synthesized as secondary metabolites in plants and fungi, and they have a wide range of bioactivities. Molecular similarity is a crucial metric in the analysis of qualitative structure–activity relationships It is sometimes difficult for current fingerprint representations to emphasize specific features for the target problems efficiently. There are projects to provide comprehensive sets of chemical descriptors, for example, PaDEL descriptor generator provides 1875 descriptors and and 12 types of fingerprints (total 16092 bits) [13] Those variables are not always important or relevant with the target features so that feature selection and optimization is indispensable. In the classification of alkaloids, these techniques to extract features from chemical structures were insufficient because of the diverged heterocyclic nitrogenous structures; i.e., 2546 types of ring skeleton were detected in 12,243 alkaloids accumulated in KNApSAcK Core DB [6]. The ring skeleton means the ring system in a chemical compound detected in a simple graph representation of a chemical

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.