AbstractNitrogen oxides (NOx = NO + NO2) are of great concern due to their impact on human health and the environment. In recent years, machine learning (ML) techniques have been widely used for surface NO2 estimation with rapid developments in computational power and big data. However, the uncertainties inherent to such retrievals are rarely studied. In this study, a novel ML framework has been developed, enhanced with uncertainty quantification techniques, to estimate surface NO2 and provide corresponding data‐induced uncertainty. We apply the Boosting Ensemble Conformal Quantile Estimator (BEnCQE) model to infer surface NO2 concentrations over Western Europe at the daily scale and 1 km spatial resolution from May 2018 to December 2021. High NO2 mainly appears in urban areas, industrial areas, and roads. The space‐based cross‐validation shows that our model achieves accurate point estimates (r = 0.8, R2 = 0.64, root mean square error = 8.08 μg/m3) and reliable prediction intervals (coverage probability, PI‐50%: 51.0%, PI‐90%: 90.5%). Also, the model result agrees with the Copernicus Atmosphere Monitoring Service (CAMS) model. The quantile regression in our model enables us to understand the importance of predictors for different NO2 level estimations. Additionally, the uncertainty information reveals the extra potential exceedance of the World Health Organization (WHO) 2021 limit in some locations, which is undetectable by only point estimates. Meanwhile, the uncertainty quantification allows assessment of the model's robustness outside existing in‐situ station measurements. It reveals challenges of NO2 estimation over urban and mountainous areas where NO2 is highly variable and heterogeneously distributed.
Read full abstract