Abstract

Cyanobacterial harmful algal blooms (CyanoHABs) pose serious risks to inland water resources. Despite advancements in our understanding of associated environmental factors and modeling efforts, predicting CyanoHABs remains challenging. Leveraging an integrated water quality data collection effort in Iowa lakes, this study aimed to identify factors associated with hazardous microcystin levels and develop one-week-ahead predictive classification models. Using water samples from 38 Iowa lakes collected between 2018 and 2021, feature selection was conducted considering both linear and nonlinear properties. Subsequently, we developed three model types (Neural Network, XGBoost, and Logistic Regression) with different sampling strategies using the nine selected variables (mcyA_M, TKN, % hay/pasture, pH, mcyA_M:16S, % developed, DOC, dewpoint temperature, and ortho-P). Evaluation metrics demonstrated the strong performance of the Neural Network with oversampling (ROC-AUC 0.940, accuracy 0.861, sensitivity 0.857, specificity 0.857, LR+ 5.993, and 1/LR- 5.993), as well as the XGBoost with downsampling (ROC-AUC 0.944, accuracy 0.831, sensitivity 0.928, specificity 0.833, LR+ 5.557, and 1/LR- 11.569). This study exhibited the intricacies of modeling with limited data and class imbalances, underscoring the importance of continuous monitoring and data collection to improve predictive accuracy. Also, the methodologies employed can serve as meaningful references for researchers tackling similar challenges in diverse environments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.