Water security remains a critical issue given the looming threats of industrial pollution, necessitating comprehensive assessments of water quality to address seasonal fluctuations and influential factors while formulating effective strategies for decision makers. This study introduces a novel approach for evaluating water quality within a complex riverine zone in South Korea: Han River that encompasses five river streams situated at each junction of North and South streams (including Gyeongan Stream) that ultimately leading towards Paldang Lake. By utilizing the monthly water characteristic data from the year 2013–2022 across 14 different locations, the significant seasonal trends and potential influences on water quality are identified. The water quality here is calculated with the proposed method of sub-index water quality index (s-WQI). A combinatorial prediction approach of s-WQI for each location is conducted through a collective of data preprocessing approaches including Hampel filtering and feature selection in prior to the machine learning predictions. In return, light gradient boosting (LGB) is the most accurate predictor by outperforming other prediction algorithms, especially through LGB-Pearson and LGB-Spearman combinations for North and South stream intersections, and LGB-Pearson for Paldang Lake. To further evaluate the robustness of this evaluation and extending the results to a foreseeable scenario, a seasonal based Monte-Carlo Simulation with 10,000 attempts targeting the water characteristic distributions obtained from each location considered are carried out to identify the risk bounds within. The results are further interpreted with SHAP analysis on identifying the contributions of each water characteristics towards the water quality through local and global spectrum. This research yields practical implications, offering tailored strategies for water quality enhancement and early warning systems. The integration of AI-based prediction and feature selection underscores the transformative potential of computational techniques in advancing data-driven water quality assessments, shaping the future of environmental science research.
Read full abstract