Abstract

An accurate prediction of water quality (WQ) related parameters is considered as pivotal decisive tool in sustainable water resources management. In this study, five different ensemble machine learning (ML) models including Quantile regression forest (QRF), Random Forest (RF), radial support vector machine (SVM), Stochastic Gradient Boosting (GBM) and Gradient Boosting Machines (GBM_H2O) were developed to predict the monthly biochemical oxygen demand (BOD) values of the Euphrates River, Iraq. For this aim, monthly average data of water temperature (T), Turbidity, pH, Electrical Conductivity (EC), Alkalinity (Alk), Calcium (Ca), chemical oxygen demand (COD), Sulfate (SO <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4</sub> ), total dissolved solids (TDS), total suspended solids (TSS), and BOD measured for ten years period were used in this study. The performances of these standalone models were compared with integrative models developed by coupling the applied ML models with two different feature extraction algorithms i.e., Genetic Algorithm (GA) and Principal Components Analysis (PCA). The reliability of the applied models was evaluated based on the statistical performance criteria of determination coefficient (R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ), root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliffe model efficiency coefficient (NSE), Willmott index (d), and percent bias (PBIAS). Results showed that among the developed models, QRF model attained the superior performance. The performance of the evaluated models presented in this study proved that the developed integrative PCA-QRF model presented much better performance compared with the standalone ones and with those integrated with GA. The statistical criteria of R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , RMSE, MAE, NSE, d, and PBIAS of PCA-QRF were 0.94, 0.12, 0.05, 0.93, 0.98, and 0.3, respectively.

Highlights

  • biochemical oxygen demand (BOD) and DO have been the commonly used parameters of water quality (WQ) for decades, this study focused on the prediction of both parameters as their accurate prediction is essential towards easing the protective initiatives

  • The integration of the Principal Components Analysis (PCA) and Genetic Algorithm (GA) feature selection approaches was investigated as the second modeling scenario

  • Owing to the fact that the wise selection of which predictor “water quality parameters” to be included in the prediction formula, it has more advantageous effects on overall performance than the choice of the modeling algorithm itself and the feature selection approaches were employed to identify the minimal subset of features for optimal learning

Read more

Summary

Introduction

Owing to the reliance of human life on water availability, both surface and groundwater bodies are exposed to various levels of contamination from different contaminants [4], [5]. Poor drainage and irrigation practices have brought about low water table and soil salinization in the region; agricultural developments and other human activities have affected the quality of water in the Euphrates Basin. These impacts are not obvious at the point of water

Objectives
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.