QSAR modeling without descriptors using graph convolutional neural networks: the case of mutagenicity prediction.

Chiakang Hung,Giuseppina Gini

doi:10.1007/s11030-021-10250-2

Abstract

Deep neural networks are effective in learning directly from low-level encoded data without the need of feature extraction. This paper shows how QSAR models can be constructed from 2D molecular graphs without computing chemical descriptors. Two graph convolutional neural network-based models are presented with and without a Bayesian estimation of the prediction uncertainty. The property under investigation is mutagenicity: Models developed here predict the output of the Ames test. These models take the SMILES representation of the molecules as input to produce molecular graphs in terms of adjacency matrices and subsequently use attention mechanisms to weight the role of their subgraphs in producing the output. The results positively compare with current state-of-the-art models. Furthermore, our proposed model interpretation can be enhanced by the automatic extraction of the substructures most important in driving the prediction, as well as by uncertainty estimations.

Full Text