Dye contamination in water sources has severe environmental and public health risks; therefore, it needs effective monitoring and remediation strategies. The aim of the study is to use machine learning techniques to develop predictive models that may be used to evaluate methylene blue dye degradation capacity in contaminated water. Ten different machine learning models, including AdaBoost, Bagging, CatBoost, Decision Tree, Extra Trees, Gradient Boosting, HistGradientBoosting, LightGBM, Random Forest, and XGBoost, were evaluated using CuWO₄@TiO₂ as a photocatalyst. R², MSE, RMSE, MAE, and MedAE were used to assess the performance of the models. Among all models, HistGradientBoosting had a very well-balanced performance. It reached a very high R² of 0.9998 on the training set and 0.9915 on the test set, coupled with low error metrics, showcasing its strong generalization capability. However, Gradient Boosting and CatBoost exhibited impressive predictive performance, while AdaBoost and Decision Tree models suffered from overfitting. The maximum prediction obtained in the case of the integral approach for MB dye degradation is 98.99%. Experimental validation indicated that the effectiveness under optimized conditions reached 98.5%. In the case of initial MB concentration at 10 mg/L, a dosage of CuWO4@TiO2 photocatalyst at 200.33 mg/L, light intensity at 150 mW/cm², contact time at 88.6 minutes at room temperature, and near-neutral pH 7.0. The final model metrics included an R² score of 0.9915, MedAE of 1.171, MSE of 5.634, MAE of 1.735, and RMSE of 2.374. This work points out the possibility of taking complete advantage of advanced machine learning algorithms along with metaheuristics optimization in improving photocatalytic processes, hence opening a bright avenue for real applications in water treatment.
Read full abstract