Inverse Molecule Design with Invertible Neural Networks as Generative Models

Wei Hu

doi:10.4236/jbise.2021.147026

Abstract

Using neural networks for supervised learning means learning a function that maps input x to output y. However, in many applications, the inverse learning is also wanted, i.e., inferring y from x, which requires invertibility of the learning. Since the dimension of input is usually much higher than that of the output, there is information loss in the forward learning from input to output. Thus, creating invertible neural networks is a difficult task. However, recent development of invertible learning techniques such as normalizing flows has made invertible neural networks a reality. In this work, we applied flow-based invertible neural networks as generative models to inverse molecule design. In this context, the forward learning is to predict chemical properties given a molecule, and the inverse learning is to infer the molecules given the chemical properties. Trained on 100 and 1000 molecules, respectively, from a benchmark dataset QM9, our model identified novel molecules that had chemical property values well exceeding the limits of the training molecules as well as the limits of the whole QM9 of 133,885 molecules, moreover our generative model could easily sample many molecules (x values) from any one chemical property value (y value). Compared with the previous method in the literature that could only optimize one molecule for one chemical property value at a time, our model could be trained once and then be sampled any multiple times and for any chemical property values without the need of retraining. This advantage comes from treating inverse molecule design as an inverse regression problem. In summary, our main contributions were two: 1) our model could generalize well from the training data and was very data efficient, 2) our model could learn bidirectional correspondence between molecules and their chemical properties, thereby offering the ability to sample any number of molecules from any y values. In conclusion, our findings revealed the efficiency and effectiveness of using invertible neural networks as generative models in inverse molecule design.

Highlights

Machine learning can be divided into three major categories: supervised learning, unsupervised learning, and reinforcement learning
We proposed to apply invertible neural networks to the task of inverse molecule design, which was inspired by the work in [1] where the main idea was to tune the input for a target value of a chemical property while keeping the weights of the neural network frozen after the forwarding learning is finished
Machine learning has shown its potential in discovering novel drug-likeness molecules from a virtually infinite search space, which suggests that human intelligence and artificial intelligence are both needed in the smart search of new drugs

Summary

Introduction

Machine learning can be divided into three major categories: supervised learning, unsupervised learning, and reinforcement learning. All these three classes of learning have found successful applications in molecule design. In the setting of supervised learning, the model learns a function that maps input x to output y, where x represents the features of a molecule and y the chemical properties of the molecule. Because the dimension of input is usually much higher than that of the output, there is a possibility of information loss when the information flows from a higher dimensional space to a lower one. Inverse learning is much needed in cheminformatics as it is very important to infer molecules from their chemical properties [1]

Objectives

Methods

Results

Conclusion