Abstract

Abstract Harmful algal blooms (HABs) pose a potential risk to human and ecosystem health. HAB occurrences are influenced by numerous environmental factors; thus, accurate predictions of HABs and explanations about the predictions are required to implement preventive water quality management. In this study, machine learning (ML) algorithms, i.e., random forest (RF) and extreme gradient boosting (XGB), were employed to predict HABs in eight water supply reservoirs in South Korea. The use of synthetic minority oversampling technique for addressing imbalanced HAB occurrences improved classification performance of the ML algorithms. Although RF and XGB resulted in marginal performance differences, XGB exhibited more stable performance in the presence of data imbalance. Furthermore, a post hoc explanation technique, Shapley additive explanation was employed to estimate relative feature importance. Among the input features, water temperature and concentrations of total nitrogen and total phosphorus appeared important in predicting HAB occurrences. The results suggest that the use of ML algorithms along with explanation methods increase the usefulness of predictive models as a decision-making tool for water quality management.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call