Abstract

Solubility is not only a significant physical property of molecules but also a vital factor in smallmolecule drug development. Determining drug solubility demands stringent equipment, controlled environments, and substantial human and material resources. The accurate prediction of drug solubility using computational methods has long been a goal for researchers. In this study, we introduce MSCSol, a solubility prediction model that integrates multidimensional molecular structure information. We incorporate a graph neural network with geometric vector perceptrons (GVP-GNN) to encode 3D molecular structures, representing spatial arrangement and orientation of atoms, as well as atomic sequences and interactions. We also employ Selective Kernel Convolution combined with Global and Local attention mechanisms to capture molecular features context at different scales. Additionally, various descriptors are calculated to enrich the molecular representation. For the 2D and 3D structural data of molecules, we design different data augmentation strategies to enhance generalization ability and prevent the model from learning irrelevant information. Extensive experiments on benchmark and independent datasets demonstrate MSCSol's superior performance. Ablation studies further confirm the effectiveness of different modules. Interpretability analysis highlights the importance of various atomic groups and substructures for solubility and verifies that our model effectively captures functional molecular structures and higher-order knowledge. The source code and datasets are freely available at https://github.com/ZiyuFanCSU/MSCSol.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call