A Novel Method for Inferring Chemical Compounds With Prescribed Topological Substructures Based on Integer Programming.

Jianshen Zhu,Hiroshi Nagamochi,Fan Zhang,Kazuya Haraguchi,Tatsuya Akutsu,Liang Zhao,Naveed Ahmed Azam,Aleksandar Shurbevski

doi:10.1109/tcbb.2021.3112598

Abstract

Drug discovery is one of the major goals of computational biology and bioinformatics. A novel framework has recently been proposed for the design of chemical graphs using both artificial neural networks (ANNs) and mixed integer linear programming (MILP). This method consists of a prediction phase and an inverse prediction phase. In the first phase, an ANN is trained using data on existing chemical compounds. In the second phase, given a target chemical property, a feature vector is inferred by solving an MILP formulated from the trained ANN and then a set of chemical structures is enumerated by a graph enumeration algorithm. Although exact solutions are guaranteed by this framework, the types of chemical graphs have been restricted to such classes as trees, monocyclic graphs, and graphs with a specified polymer topology with cycle index up to 2. To overcome the limitation on the topological structure, we propose a new flexible modeling method to the framework so that we can specify a topological substructure of graphs and a partial assignment of chemical elements and bond-multiplicity to a target graph. The results of computational experiments suggest that the proposed system can infer chemical graphs with around up to 50 non-hydrogen atoms.

Full Text