Abstract
The fast and accurate prediction of Hansen solubility benefits many diverse fields such as pharmaceuticals, the food industry, and cosmetics. To estimate the individual HSP values (polar, dispersive, and hydrogen bonding components), we investigated the performance of using Mordred descriptors in multiple linear regressions and XGBoost modeling. For HSP predictions, we also tested a graph-based molecular representation with graph neural network (GNN) modeling. To select the optimal models for final training and predictions, we used nested cross-validation and hyper-parameter optimization. The models with the best predictive performance were selected through internal (R2train, RMSE, MEPcv) and external (RMSEP, CCC, MEP, R2test, ar2m, Δr2m) validation metrics using ∼1200 compounds from free-available database https://www.stevenabbott.co.uk. To confirm the practical reliability, we examined the agreement of experimentally obtained HSP data from the literature for 93 compounds and the data predicted by the created models. The results of GNN modeling showed the best predictive characteristics, which include a coefficient of determination between experimentally obtained and predicted HSP values greater than 0.76 for polar and hydrogen bond forces and greater than 0.66 for dispersive forces. Interpreting the fundamental basis of Hansen solubility using the created MLR equations and XGBoost models, HSP values were found to be influenced by van der Waals volume characteristics, 2D matrix molecular representation, and polarity. We elaborated on the practical benefits of using the selected GNN method through Hansen's solubility sphere as an example. This is the first study to demonstrate the advantages of GNN in predicting individual HSP components, as well as the first study to describe in detail their molecular basis using MLR and XGBoost modeling.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.