Over the past few decades, harmful algal blooms (HABs) have occurred frequently worldwide. The application of harmful algal bloom detection when based solely on water quality measurements proves challenging in achieving broad generalization across various regions. Satellite remote sensing, due to its low risk, cost effectiveness, and wide ground-coverage capabilities, has been extensively employed in HAB detection tasks. However, relying solely on remote sensing data poses issues of false positives, false negatives, and the incomplete consideration of contributing factors in HAB detection. This study proposes a model for harmful algal bloom detection by integrating MODIS multifactor data with heterogeneous meteorological data. Initially, a dataset named MODIS_MI_HABs is constructed by gathering information from 192 instances of harmful algal bloom events worldwide. Subsequently, remote sensing data corresponding to specific regions are collected; all were obtained from a moderate resolution imaging spectroradiometer (MODIS) aboard an ocean-color-detecting satellite. This dataset encompasses variables such as chlorophyll-a concentration, the sea surface temperature, photosynthetically active radiation, the relative radiation stability differences, the six seawater-absorption coefficients, and three scattering coefficients. By fusing six meteorological factors, latitude and longitude information, and remote sensing data, a regression dataset for harmful algal bloom detection is established. Finally, employing harmful algal bloom cell concentration as the data label, seven machine learning models are employed to establish correlations between the remote sensing data, heterogeneous meteorological data, and harmful algal bloom cell concentrations. The root mean square error (RMSE), mean absolute error (MAE), explained variance (EV), and coefficient of determination (R2) parameters are used to evaluate the regression performance. The results indicate that the extreme gradient boosting (XGR) model demonstrates the best predictive capability for harmful algal blooms (leave-one-out: RMSE/MAE = 0.0714). The XGR model, trained with the entire dataset, yields the optimal predictive performance (RMSE = 0.0236, MAE = 0.0151, EV = 0.9593, R2 = 0.9493). When compared to the predictions based on the fixed-area water quality analysis and single-source remote sensing data usage, the proposed approach in this paper displays wide applicability, offering valuable support for the sustainable development of marine ecology.
Read full abstract