Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Read full abstract