Abstract

In de novo molecular design, recurrent neural networks (RNN) have been shown to be effective methods for sampling and generating novel chemical structures. Using a technique called reinforcement learning (RL), an RNN can be tuned to target a particular section of chemical space with optimized desirable properties using a scoring function. However, ligands generated by current RL methods so far tend to have relatively low diversity, and sometimes even result in duplicate structures when optimizing towards desired properties. Here, we propose a new method to address the low diversity issue in RL for molecular design. Memory-assisted RL is an extension of the known RL, with the introduction of a so-called memory unit. As proof of concept, we applied our method to generate structures with a desired AlogP value. In a second case study, we applied our method to design ligands for the dopamine type 2 receptor and the 5-hydroxytryptamine type 1A receptor. For both receptors, a machine learning model was developed to predict whether generated molecules were active or not for the receptor. In both case studies, it was found that memory-assisted RL led to the generation of more compounds predicted to be active having higher chemical diversity, thus achieving better coverage of chemical space of known ligands compared to established RL methods.

Highlights

  • Over the last few years, machine learning, and in particular deep learning, has led to numerous breakthroughs in the field of computer vision, speech recognition, and medical diagnosis

  • The results showed that the differential neural computer (DNC)-based Generative Adversarial Networks (GAN) generated Simple Molecular Input Line Entry System (SMILES) that were longer, more diverse, and more complex than the SMILES generated by ORGANIC

  • We developed the memory unit to address the common issue in reinforcement learning (RL) that the generated compounds often lack chemical diversity due to the so-called policy collapse

Read more

Summary

Introduction

Over the last few years, machine learning, and in particular deep learning, has led to numerous breakthroughs in the field of computer vision, speech recognition, and medical diagnosis. To analyze if the use of the memory unit resulted in the generation of unrelated compounds to the training set, we investigated analog relationships between the generated compounds with the training set using count-based ECFP6 Tanimoto similarity and the matched molecular pair (MMP) formalism [49]. The generated structures with predicted LogP between 2.0 and 3.0 were compared to the 487 training compounds based on fingerprint similarity (counted ECFP6) and their MMP relationships.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call