Abstract

In this study, different K-nearest neighbors (KNN), support vector regression (SVR), decision tree (DT), and random forest (RF) algorithms integrated with the Bayesian optimization algorithm (BOP) have been applied as novel hybrid modeling/optimization tools to predict the total nitrogen in treated wastewater of Southern Tehran Wastewater Treatment Plant (STWWTP). In order to enhance the outcomes of hybrid models, the chosen sub-models (the best and least correlated hybrid models) were used to generate voting average and stacked regression ensemble models. Throughout the preprocessing step, two alternative scenarios were used to handle missing values from the samples, including elimination versus estimation via linear interpolation. The results of this research demonstrated that ensemble models were better than individual hybrid models, although not all ensemble models were superior to single models. The results also revealed that the stacking regression ensemble model using KNN-BOP and SVR-BOP as sub-models was the most superior model among the developed models, with the coefficient of determination (R2) = 0.640, root mean squared error (RMSE) = 2.378, and mean absolute error (MAE) = 1.838 on the test data. The best hybrid ensemble model that can accurately predict the concentration of total nitrogen (TN) in the effluent can give people a heads-up about water pollution caused by eutrophication before it gets bad.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call