ABSTRACT Recent years have seen increasing academic attention to surface ozone pollution due to its significant impacts on air quality and human health. To overcome the spatial coverage limitation of surface ozone ground monitoring, we proposed a novel approach that integrated the Generative Adversarial Network (GAN) with the Light Gradient Boosting Machine (LGBM) for full-coverage surface ozone estimation in the Yangtze River Delta Urban Agglomeration (YRDUA), using ground monitoring data and Sentinel-5P satellite data. We assessed the performance of the GAN-LGBM model against other decision-tree-based models (XGBoost, LGBM) using three cross-validation (CV) methods: sample-based, space-based, and time-based. The results demonstrated that the GAN-LGBM model consistently outperformed other models across all evaluation metrics and validation scenarios, achieving the highest correlation coefficient (R 2) of 0.94 in sample-based CV. Spatiotemporal evaluations further showed the robustness of the GAN-LGBM model and its ability to capture complex surface ozone concentration patterns. This study introduces a promising method for full-coverage surface ozone estimation and explores the potential of incorporating unsupervised learning methods into regression models to address complex correlations in environmental datasets.
Read full abstract