Abstract

Molecular optimization, which transforms a given input molecule X into another Y with desired properties, is essential in molecular drug discovery. The traditional approaches either suffer from sample-inefficient learning or ignore information that can be captured with the supervised learning of optimized molecule pairs. In this study, we present a novel molecular optimization paradigm, Graph Polish. In this paradigm, with the guidance of the source and target molecule pairs of the desired properties, a heuristic optimization solution can be derived: given an input molecule, we first predict which atom can be viewed as the optimization center, and then the nearby regions are optimized around this center. We then propose an effective and efficient learning framework, Teacher and Student polish, to capture the dependencies in the optimization steps. A teacher component automatically identifies and annotates the optimization centers and the preservation, removal, and addition of some parts of the molecules; a student component learns these knowledges and applies them to a new molecule. The proposed paradigm can offer an intuitive interpretation for the molecular optimization result. Experiments with multiple optimization tasks are conducted on several benchmark datasets. The proposed approach achieves a significant advantage over the six state-of-the-art baseline methods. Also, extensive studies are conducted to validate the effectiveness, explainability, and time savings of the novel optimization paradigm.

Highlights

  • I NTRODUCING a new drug into the market takes over one billion USD and an average of 13 years [1], [2]

  • With the guidance of the source and target molecule pairs of the desired properties, we first predict which atom can be viewed as the optimization center, and the nearby regions are optimized around this center

  • We compare our approach with the following state-of-theart baselines, in which variational junction tree encoder–decoder (VJTNN), GVJTNN, and copy&refine strategy (CORE) as well as the proposed method require supervised molecular pairs, while Molecule Deep Q-Networks (MolDQN), GCPN, and junction tree variational autoencoder (JTVAE) have no need of such supervision

Read more

Summary

INTRODUCTION

I NTRODUCING a new drug into the market takes over one billion USD and an average of 13 years [1], [2]. As shown, by differentiating between the source and target molecules we can derive a heuristic optimization solution: the appropriate substructures (the blue area) are first identified and preserved, and the surrounding context (the yellow area) is transformed In this way, we can leverage the information of source molecules to decrease the generation steps toward target molecules and guide the subsequent generation steps as prior knowledges. Inspired by the above observations, in this study, we present a novel molecular optimization paradigm, Graph Polish In this paradigm, with the guidance of the source and target molecule pairs of the desired properties, we first predict which atom can be viewed as the optimization center, and the nearby regions are optimized around this center. These optimization steps naturally offer a reference for researchers to understand the process of molecular optimization

Generative Model of Molecules
Supervised Learning-Based Molecule Generation
PRELIMINARIES AND PROBLEM FORMULATION
METHOD
Teacher Component
Student Component
EXPERIMENTS
Baselines
Metrics
Implementation Details
Results
Scalability
Generalizability
Extensive Study
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call