Predicting drug solubility in organic solvents mixtures: A machine-learning approach supported by high-throughput experimentation

Francesca Cenci,Samir Diab,Paola Ferrini,Catajina Harabajiu,Massimiliano Barolo,Fabrizio Bezzo,Pierantonio Facco

doi:10.1016/j.ijpharm.2024.124233

Francesca Cenci, Samir Diab + Show 5 more

Open Access

https://doi.org/10.1016/j.ijpharm.2024.124233

Copy DOI

Journal: International Journal of Pharmaceutics	Publication Date: May 1, 2024
License type: cc-by-nc-nd

Abstract

A novel approach based on supervised machine-learning is proposed to predict the solubility of drugs and drug-like molecules in mixtures of organic solvents. Similar to quantitative structure–property relationship (QSPR) models, different solvent types are identified by molecular descriptors, which, in this study, are considered as UNIFAC subgroups. To overcome the potential lack of UNIFAC subgroups for the complex Active Pharmaceutical Ingredients (APIs) currently developed in the pharmaceutical industry, the API molecule is considered as a unique entity in the proposed modelling approach. Therefore, API solubility is predicted as a function of temperature, functional subgroups of the solvents and composition of the solvent mixture; in turn, regressors’ correlation is handled through Partial Least-Squares (PLS) regression. The method is developed and tested with experimental data of a real API and 14 organic solvents that are industrially employed for crystallisation. Solubility predictions are accurate and precise for single solvents, binary mixtures and ternary mixtures of organic solvents at different compositions and temperatures, with a determination coefficient R2 ≥ 0.90.To further test the applicability of the model, the proposed approach is applied to 9 literature organic solubility datasets of drugs and drug-like compounds and compared to benchmark solubility models in the literature. Results show that the proposed approach provides satisfactory predictions: the majority of validation and calibration data have R2 = 0.95–0.99; the ratio between RMSE (root mean squared error) of the proposed method and the range of measured solubility values is from 1 to 3 orders of magnitude smaller than the RMSE ratio obtained by the benchmark models.

Full Text