In many coastal cities around the world, continuing water degradation threatens the living environment of humans and aquatic organisms. To assess and control the water pollution situation, this study estimated the Biochemical Oxygen Demand (BOD) concentration of Hong Kong's marine waters using remote sensing and an improved machine learning (ML) method. The scheme was derived from four ML algorithms (RBF, SVR, RF, XGB) and calibrated using a large amount (N > 1000) of in-situ BOD5 data. Based on labeled datasets with different preprocessing, i.e., the original BOD5, the log10(BOD5), and label distribution smoothing (LDS), three types of models were trained and evaluated. The results highlight the superior potential of the LDS-based model to improve BOD5 estimate by dealing with imbalanced training dataset. Additionally, XGB and RF outperformed RBF and SVR when the model was developed using log10(BOD5) or LDS(BOD5). Over two decades, the BOD5 concentration of Hong Kong marine waters in the autumn (Sep. to Nov.) shows a downward trend, with significant decreases in Deep Bay, Western Buffer, Victoria Harbour, Eastern Buffer, Junk Bay, Port Shelter, and the Tolo Harbour and Channel. Principal component analysis revealed that nutrient levels emerged as the predominant factor in Victoria Harbour and the interior of Deep Bay, while chlorophyll-related and physical parameters were dominant in Southern, Mirs Bay, Northwestern, and the outlet of Deep Bay. LDS provides a new perspective to improve ML-based water quality estimation by alleviating the imbalance in the labeled dataset. Overall, the remotely sensed BOD5 can offer insight into the spatial-temporal distribution of organic matter in Hong Kong coastal waters and valuable guidance for the pollution control.
Read full abstract