Abstract

Accurate and efficient prediction of the molecular property is one of the fundamental problems in drug research and development. Recent advancements in representation learning have been shown to greatly improve the performance of molecular property prediction. However, due to limited labeled data, supervised learning-based molecular representation algorithms can only search limited chemical space and suffer from poor generalizability. In this work, we proposed a self-supervised learning method, ATMOL, for molecular representation learning and properties prediction. We developed a novel molecular graph augmentation strategy, referred to as attention-wise graph masking, to generate challenging positive samples for contrastive learning. We adopted the graph attention network as the molecular graph encoder, and leveraged the learned attention weights as masking guidance to generate molecular augmentation graphs. By minimization of the contrastive loss between original graph and augmented graph, our model can capture important molecular structure and higher order semantic information. Extensive experiments showed that our attention-wise graph mask contrastive learning exhibited state-of-the-art performance in a couple of downstream molecular property prediction tasks. We also verified that our model pretrained on larger scale of unlabeled data improved the generalization of learned molecular representation. Moreover, visualization of the attention heatmaps showed meaningful patterns indicative of atoms and atomic groups important to specific molecular property.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call