Abstract
Data-driven models are efficient decision support tools for monitoring harmful algal blooms (HABs), particularly with the advent of the Internet of Things (IoT) and continuous data acquisition sensing systems. While high chlorophyll-a concentrations in freshwater systems are often used as an index of HABs, diurnal variations in physicochemical water quality parameters influence chlorophyll-a concentrations in these systems. Previous studies have demonstrated the effectiveness of deep learning models in predicting chlorophyll-a as an index of HABs. However, predicting HABs before they occur often comes with uncertainties related to the frequency of observed data. This study used varying sampling periods to develop various Long Short-Term Memory (LSTM) models coupled with a Monte Carlo dropout scheme to estimate model uncertainties for chlorophyll-a predictions in a freshwater lake. The model’s performances were compared using indices such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and correlation coefficient. The results showed that while a daily water quality dataset predicted chlorophyll-a concentrations 3-days in advance with the conventional 0.2 dropout rate, an hourly dataset predicted chlorophyll-a concentrations 7-days in advance. When the effect of the sampling period was considered, the full dataset (both daytime and nighttime datasets combined) provided better chlorophyll-a predictions compared to the model that only used either the daytime or nighttime dataset. This research outcome will assist in predicting HABs timely using chlorophyll-a as an index, aid in developing early warning systems, and understand the data-sampling, data-transmission, and data-storage needs, which are often the critical elements for data-driven modeling.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have