Abstract

River systems provide multiple ecosystem services to society globally, but these are already degraded or threatened in many areas of the world due to water quality issues linked to diffuse and point-source pollutant inputs. Water quality evaluation is essential to develop remediation and management strategies. Computational tools such as machine learning based predictive models have been developed to improve monitoring network capabilities. The model's performance is reduced when datasets composed of reductant information are used for training, on the other hand, the selection of most representative and variable water quality scenarios could result in higher precision. This study analyzed historical water quality behavior in the Santiago River, Mexico, to identify the most variable and representative data available to train machine learning models (Adaptive Neuro Fuzzy Inference System – ANFIS, Artificial Neural Network – ANN, and Support Vector Machine - SVM). Thirteen monitoring sites were clustered according to their water quality variability from 2009 to 2022. Subsequently, a Time Series Analysis (TSA) was used to select the most representative monitoring station from each cluster. Data for 6/13 monitoring sites were retained for the Best Training Subset (BTS) used to train restricted models that performed with similar (ANN and SMV) or higher (ANFIS) prediction accuracy (in terms of RMSE, MAE, MSE and R2) for both training and testing. This study provides evidence of water quality data containing redundant information that is not useful to improve machine learning model performance, in turn leading to overtraining. Combined analytical approaches can maximize the representativeness and variability of data selected for machine learning applications, leading to improved prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.