Abstract
Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning (RL) approach for generating molecules in Cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning (IL) on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a RL setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how RL further refines the IL model in domains far from the training data.
Highlights
Discovering novel molecules or materials with desirable properties is a challenging task because of the immense size of chemical compound space
Whereas generative models rely on databases for pretraining, reinforcement learning (RL) is usually done without any prior knowledge leading to initial inefficiency as basic chemical and physical rules are learned
In this work we built upon a previous reinforcement learning algorithm called Atomistic Structure Learning Algorithm (ASLA)[34] by incorporating databases into molecular RL to improve sample efficiency while simultaneously allowing the machine learning (ML) model to learn beyond the knowledge contained in the database
Summary
Discovering novel molecules or materials with desirable properties is a challenging task because of the immense size of chemical compound space. Further complicating the process is the costly and time-consuming process of synthesizing and testing proposed structures Whereas this procedure historically was driven by a trial-and-error process, the advance of computational quantum chemical methods allows for initial screening to select promising molecules for experimental testing. An added benefit of virtual screening is the creation of numerous databases containing structures with computed chemical and physical properties[10,11,12,13,14] leading to generative models for discovery of novel molecules and materials[15,16,17,18,19,20,21,22,23,24,25,26]. We apply the method to larger molecules outside the training distribution
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have