Abstract

Molecular fingerprints are essential cheminformatics tools for machine learning with applications in drug discovery. Standard fingerprint software compute fixed-size feature vectors, and employ them as inputs to a deep neural network or other machine learning methods. Fixed-size fingerprint representation of molecules, however, requires extremely large vectors to encode all possible substructures. Limited accuracy and poor interpretation also occur due to the underlying neural network which emphasizes particular and exclusive aspects of the molecular structure. In this study, we develop a novel graph convolutional network (GCN) to predict the binding free energy of protein-ligand complexes. By adding a physics-based layer to the network architecture, the accuracy of the molecular fingerprint has been improved while potential overfitting has been avoided. In addition to standard information about the substuctures, our hyrbid physics-data model encodes atomic bonds features which captures structural features and enables further analysis. It has been shown that machine-optimized fingerprints, compared to fixed-sized fingerprints, can provide more accurate predictions, better performance, and more interpretable results. We show that the proposed GCN fingerprint outperforms the predictive performance of standard fingerprints on binding free energy of host-guest systems and PDBbind database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call