Current machine learning (ML) applications in atmospheric science focus on forecasting and bias correction for numerical modeling estimations, but few studies examined the nonlinear response of their predictions to precursor emissions. This study uses ground-level maximum daily 8-hour ozone average (MDA8 O3) as an example to examine O3 responses to local anthropogenic NOx and VOC emissions in Taiwan by Response Surface Modeling (RSM). Three different datasets for RSM were examined, including the Community Multiscale Air Quality (CMAQ) model data, ML-measurement-model fusion (ML-MMF) data, and ML data, which respectively represent direct numerical model predictions, numerical predictions adjusted by observations and other auxiliary data, and ML predictions based on observations and other auxiliary data.The results show that both ML-MMF (r = 0.93–0.94) and ML predictions (r = 0.89–0.94) present significantly improved performance in the benchmark case compared with CMAQ predictions (r = 0.41–0.80). While ML-MMF isopleths exhibit O3 nonlinearity close to actual responses due to their numerical base and observation-based correction, ML isopleths present biased predictions concerning their different controlled ranges of O3 and distorted O3 responses to NOx and VOC emission ratios compared with ML-MMF isopleths, which implies that using data without support from CMAQ modeling to predict the air quality could mislead the controlled targets and future trends. Meanwhile, the observation-corrected ML-MMF isopleths also emphasize the impact of transboundary pollution from mainland China on the regional O3 sensitivity to local NOx and VOC emissions, which transboundary NOx would make all air quality regions in April more sensitive to local VOC emissions and limit the potential effort by reducing local emissions.Future ML applications in atmospheric science like forecasting or bias correction should provide interpretability and explainability, except for meeting statistical performance and providing variable importance. Assessment with interpretable physical and chemical mechanisms and constructing a statistically robust ML model should be equally important.
Read full abstract