Background: Aqueous Solubility (AS) is a critical factor in drug discovery (DD), directly influencing a drug’s bioavailability and overall efficacy. Accurate prediction of AS remains a challenge despite the advancement in machine learning techniques, which are essential for improving the pharmacokinetics and formulation of new compounds. Methods: This study determines an enhanced ResNet50 deep learning architecture for predicting AS in drug compounds. Deep-net models with 8, 16, and 20-layer ResNet50 Convolutional Neural Network (CNN) architectures were developed. A dataset of 9,532 drug compounds, represented by molecular footprints, was used to train the models. The training process utilized a ten-fold cross-validation technique to optimize the model's predictive performance. Results: The 20-layer ResNet50 model outperformed human experts and shallower models, achieving an R² value of 0.423 and an RMSE of 0.678. The model also demonstrated an impressive ASP accuracy rate of 90.6%, significantly surpassing the predictions made by human experts and simpler neural network models. Conclusion: This study demonstrates that deeper-net architectures, particularly the 20-layer ResNet50 model, offer superior performance in predicting AS. These deep learning models provide a reliable and efficient solution for improving solubility predictions, crucial for advancing drug discovery efforts.
Read full abstract