Abstract

Cyanobacterial blooms appear by complex causes such as water quality, climate, and hydrological factors. This study aims to present the machine learning models to predict occurrences of these complicated cyanobacterial blooms efficiently and effectively. The dataset was classified into groups consisting of two, three, or four classes based on cyanobacterial cell density after a week, which was used as the target variable. We developed 96 machine learning models for Chilgok weir using four classification algorithms: k-Nearest Neighbor, Decision Tree, Logistic Regression, and Support Vector Machine. In the modeling methodology, we first selected input features by applying ANOVA (Analysis of Variance) and solving a multi-collinearity problem as a process of feature selection, which is a method of removing irrelevant features to a target variable. Next, we adopted an oversampling method to resolve the problem of having an imbalanced dataset. Consequently, the best performance was achieved for models using datasets divided into two classes, with an accuracy of 80% or more. Comparatively, we confirmed low accuracy of approximately 60% for models using datasets divided into three classes. Moreover, while we produced models with overall high accuracy when using logCyano (logarithm of cyanobacterial cell density) as a feature, several models in combination with air temperature and NO3-N (nitrate nitrogen) using two classes also demonstrated more than 80% accuracy. It can be concluded that it is possible to develop very accurate classification-based machine learning models with two features related to cyanobacterial blooms. This proved that we could make efficient and effective models with a low number of inputs.

Highlights

  • Harmful Algal Blooms (HABs) have appeared due to pollution of aquatic environments, and increasingly due to climate change, which has been a cause for the increase in water temperature [1,2]

  • The main objective of this study is to develop optimal classification-based machine learning models for efficiently and effectively predicting occurrences of cyanobacterial blooms through the process of feature selection and the oversampling of datasets

  • The target variable was a class based on Cyano(t+1) for each group; Normal/Caution/Warning/Outbreak for Group1, Normal/Occurrence for Group2, and None/Normal/Occurrence for Group3

Read more

Summary

Introduction

Harmful Algal Blooms (HABs) have appeared due to pollution of aquatic environments, and increasingly due to climate change, which has been a cause for the increase in water temperature [1,2]. There are increasing concerns that the combined environmental factors of uncontrolled pollution and climate change ( higher temperature) may lead to more frequent and more severe HABs [3,4,5]. HABs have been negatively affecting the aquatic environment and human health because they produce toxic substances [6] such as microcystin [7,8]. We can recognize the serious problems of HABs through the studies that showed that algal blooms (or cyanobacterial blooms) caused fish death [9,10] and human liver disease [11]. The challenges for water management in preventing or minimizing HABs are linked to the complexity of the HAB processes (including identification of main conditioning factors), their site-specificity, and associated difficulties in their prediction [12,13].

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.