Molecular property prediction plays an essential role in drug discovery for identifying the candidate molecules with target properties. Deep learning models usually require sufficient labeled data to train good prediction models. However, the size of labeled data is usually small for molecular property prediction, which brings great challenges to deep learning-based molecular property prediction methods. Furthermore, the global information of molecules is critical for predicting molecular properties. Therefore, we propose INTransformer for molecular property prediction, which is a data augmentation method via contrastive learning to alleviate the limitations of the labeled molecular data while enhancing the ability to capture global information. Specifically, INTransformer consists of two identical Transformer sub-encoders to extract the molecular representation from the original SMILES and noisy SMILES respectively, while achieving the goal of data augmentation. To reduce the influence of noise, we use contrastive learning to ensure the molecular encoding of noisy SMILES is consistent with that of the original input so that the molecular representation information can be better extracted by INTransformer. Experiments on various benchmark datasets show that INTransformer achieved competitive performance for molecular property prediction tasks compared with the baselines and state-of-the-art methods.
Read full abstract