In this study, two data-driven models, artificial neural networks and support vector regression (SVR), have been trained and optimized to predict the biogas yield from anaerobic digesters at the South-Tehran municipal wastewater treatment plant. The auto-tuning approach, including feature selection and hyperparameter population-based optimization, was applied through the genetic algorithm (GA) and particle swarm optimization to improve the training models' performance and help them obtain the best input parameters. The Shapley Additive Explanations (SHAP) analysis was also done to interpret models effectively and assign credit for a model's prediction to each feature. The findings demonstrated that biogas prediction using SVR-GA achieved the highest accuracy, with R2 values of 0.725 and 0.773, and RMSE values (regarding normalized datasets) of 0.477 and 0.492 for the train and test, respectively, while requiring the least computational time compared to other models. The auto-tuning technique, by removing the less important inputs, was able to show that temperature, pH, effluent and influent dry solids, effluent volatile solids (VS), and influent VS of waste sludge were the best input parameters for optimal biogas production modeling. The SHAP analysis revealed that VSeff and temperature were two of the most important features affecting biogas production, exhibiting an inverse impact.
Read full abstract