Algal blooms are a widespread issue in eutrophic lakes. Compared with the satellite-derived surface algal bloom area and chlorophyll-a (Chla) concentration, algae biomass is a more stable way to reflect water quality. Although satellite data have been adopted to observe the water column integrated algal biomass, the previous methods mostly are empirical algorithms, which are not stable enough for widespread use. This paper proposed a machine learning algorithm based on Moderate Resolution Imaging Spectrometer (MODIS) data to estimate the algal biomass, which was successfully applied to a eutrophic lake in China, Lake Taihu. This algorithm was developed by linking Rayleigh-corrected reflectance to in situ algae biomass data in Lake Taihu (n = 140), and the different mainstream machine learning (ML) methods were compared and validated. The partial least squares regression (PLSR) (R2 = 0.67, mean absolute percentage error (MAPE) = 38.88 %) and support vector machines (SVM) (R2 = 0.46, MAPE = 52.02 %) performed poor satisfactory. In contrast, random forest (RF) and extremely gradient boosting tree (XGBoost) algorithms had higher accuracy (RF: R2 = 0.85, MAPE = 22.68 %; XGBoost: R2 = 0.83, MAPE = 24.06 %), demonstrating greater application potential in algal biomass estimation. Field biomass data were further used to estimate the RF algorithm, which showed acceptable precision (R2 = 0.86, MAPE < 7 mg Chla). Subsequently, sensitivity analysis showed that the RF algorithm was not sensitive to high suspension and thickness of aerosols (rate of change <2 %), and inter-day and consecutive days verification showed stability (rate of change <5 %). The algorithm was also extended to Lake Chaohu (R2 = 0.93, MAPE = 18.42 %), demonstrating its potential in other eutrophic lakes. This study for algae biomass estimation provides technical means with higher accuracy and greater universality for the management of eutrophic lakes.
Read full abstract