Abstract

Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework has been proposed for inverse QSAR/QSPR, where in the first phase an artificial neural network (ANN) is used to construct a prediction function. In the second phase, a mixed integer linear program (MILP) formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. The framework has been applied to the case of chemical compounds with cycle index up to 2 so far. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector can be inferred by solving an MILP for up to n=40, whereas graphs can be enumerated for up to n=15. When applied to the case of chemical acyclic graphs, the maximum computable diameter of a chemical structure was up to 8. In this paper, we introduce a new characterization of graph structure, called “branch-height” based on which a new MILP formulation and a new graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using such chemical properties as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs with around n=50 and diameter 30.

Highlights

  • In computational molecular biology, various types of data have been utilized, which include sequences, gene expression patterns, and protein structures

  • Experimental results We implemented our method of Stages 1 to 5 for inferring chemical acyclic graphs and conducted experiments to evaluate the computational efficiency for three chemical properties π : octanol/water partition coefficient (Kow), boiling point (Bp) and heat of combustion (Hc)

  • For each property π ∈ { Kow, Bp, Hc}, we first select a set of chemical elements and collected a data set Dπ on chemical acyclic graphs over the set of chemical elements provided by the Hazardous Substances Data Bank (HSDB) of PubChem

Read more

Summary

Introduction

Various types of data have been utilized, which include sequences, gene expression patterns, and protein structures. Graph structured data have been extensively utilized, which include metabolic pathways, protein-protein interaction networks, gene regulatory networks, and chemical graphs. Much attention has recently been paid to the analysis of chemical graphs due to its potential applications to computer-aided drug design. Azam et al Algorithms Mol Biol (2021) 16:18 approaches to computer-aided drug design is quantitative structure activity/property relationship (QSAR/ QSPR) analysis, the purpose of which is to derive quantitative relationships between chemical structures and their activities/properties. Inverse QSAR/ QSPR has been extensively studied [1, 2], the purpose of which is to infer chemical structures from given chemical activities/properties. Inverse QSAR/QSPR is often formulated as an optimization problem to find a chemical structure maximizing (or minimizing) an objective function under various constraints

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call