This study evaluates models for predicting volatile fatty acid (VFA) concentrations in sludge processing, ranging from classical statistical methods (Gaussian and Surge) to diverse machine learning algorithms (MLAs) such as Decision Tree, XGBoost, CatBoost, LightGBM, Multiple linear regression (MLR), Support vector regression (SVR), AdaBoost, and GradientBoosting. Anaerobic bio-methane potential tests were carried out using domestic wastewater treatment primary and secondary sludge. The tests were monitored over 40days for variations in pH and VFA concentrations under different experimental conditions. The data observed was compared to predictions from the Gaussian and Surge models, and the MLAs. Based on correlation analysis using basic statistics and regression, the Gaussian model appears to be a consistent performer, with high R2 values and low RMSE, favoring precision in forecasting VFA concentrations. The Surge model, on the other hand, albeit having a high R2, has high prediction errors, especially in dynamic VFA concentration settings. Among the MLAs, Decision Tree and XGBoost excel at predicting complicated patterns, albeit with overfitting issues. This study provides insights underlining the need for context-specific considerations when selecting models for accurate VFA forecasts. Real-time data monitoring and collaborative data sharing are required to improve the reliability of VFA prediction models in AD processes, opening the way for breakthroughs in environmental sustainability and bioprocessing applications.
Read full abstract