Abstract

In the field of machine learning and cheminformatics, the prediction of molecular properties holds significant importance. Molecules can be represented in various formats, including 1D SMILES string, 2D graph, and 3D conformation. Numerous models have been proposed for different representations to accomplish molecular property prediction. However, most recent works have focused on one or two representations or combining embedding vectors from different perspectives in an unsophisticated manner. To address this issue, we present PremuNet, a novel pre-trained multi-representation fusion network for molecular property prediction. PremuNet can extract comprehensive molecular information from multiple views and combine them interactively through pre-training and fine-tuning. The framework of PremuNet consists of two branches: a Transformer-GNN branch that extracts SMILES and graph information, and a Fusion Net branch that extracts topology and geometry information, called PremuNet-L and PremuNet-H respectively. We employ masked self-supervised methods to enable the model to learn information fusion and achieve enhanced performance in downstream tasks. The proposed model has been evaluated on eight molecular property prediction tasks, including five classification and three regression tasks, and attained state-of-the-art performance in most cases. Additionally, we conduct the ablation studies to demonstrate the effect of each view and the branch combination approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call