This study investigates the use of machine learning models to predict solubility of rivaroxaban in binary solvents based on temperature (T), mass fraction (w), and solvent type. Using a dataset with over 250 data points and including solvents encoded with one-hot encoding, four models were compared: Gradient Boosting (GB), Light Gradient Boosting (LGB), Extra Trees (ET), and Random Forest (RF). The Jellyfish Optimizer (JO) algorithm was applied to tune hyperparameters, enhancing model performance. The LGB model achieved the best results, with an R2 of 0.988 on the test set and low error rates (RMSE of 9.1284E-05 and MAE of 5.85322E-05), surpassing other models in predictive accuracy and generalizability. Parity plots confirmed the LGB model’s close alignment between predicted and actual solubility values, highlighting its robust performance. Furthermore, 3D surface plots and partial effect plots demonstrated LGB’s capacity to model solubility across different solvent systems, capturing complex interactions between T, w, and solvent effects. Finally, the LGB model predicted maximum solubility at a temperature of 305.76 K and a mass fraction of 0.753 in a dichloromethane + methanol mixture, providing valuable insights for solubility optimization in solvent selection. This work underscores the effectiveness of the LGB model for solubility prediction, with potential applications in formulation and experimental planning.
Read full abstract