Molecular property prediction by contrastive learning with attention-guided positive sample selection

Jinxian Wang,Jihong Guan,Shuigeng Zhou,Jonathan Wren

doi:10.1093/bioinformatics/btad258

Jinxian Wang, Jihong Guan + Show 2 more

Open Access

https://doi.org/10.1093/bioinformatics/btad258

Copy DOI

Journal: Bioinformatics	Publication Date: Apr 20, 2023
Citations: 3	License type: CC BY 4.0

Affiliation: Tongji University, Fudan University

Abstract

Predicting molecular properties is one of the fundamental problems in drug design and discovery. In recent years, self-supervised learning (SSL) has shown its promising performance in image recognition, natural language processing, and single-cell data analysis. Contrastive learning (CL) is a typical SSL method used to learn the features of data so that the trained model can more effectively distinguish the data. One important issue of CL is how to select positive samples for each training example, which will significantly impact the performance of CL. In this article, we propose a new method for molecular property prediction (MPP) by Contrastive Learning with Attention-guided Positive-sample Selection (CLAPS). First, we generate positive samples for each training example based on an attention-guided selection scheme. Second, we employ a Transformer encoder to extract latent feature vectors and compute the contrastive loss aiming to distinguish positive and negative sample pairs. Finally, we use the trained encoder for predicting molecular properties. Experiments on various benchmark datasets show that our approach outperforms the state-of-the-art (SOTA) methods in most cases. The code is publicly available at https://github.com/wangjx22/CLAPS.

Full Text