Abstract

Abstract. Well-estimated air pollutant concentration fields are critically important to compensate for observations that are only sparsely available, especially over non-urban areas. Previous data fusion methods generally used statistical models to relate observations of target variables to proxy data and supporting variables at known stations. In this study, we developed a new data fusion paradigm by designing a deep-learning model framework and workflow to learn multivariable spatial correlations from chemical transport model (CTM) simulations, before using it to estimate PM2.5 reanalysis fields from station observations. The model was composed of two modules as an explainable PointConv operation to pre-process isolated observations and a regression grid-to-grid network to build correlations among multiple variables. The model was trained with only CTM simulations and supporting geographical covariates. The trained model was evaluated in two aspects of (1) reproducing raw PM2.5 CTM simulations and (2) generating reanalysis and fused PM2.5 fields. First, the model was able to reproduce the CTM simulations well on a full domain from sampled CTM data items at sparse locations with an average R2=0.94 and RMSE = 4.85 µg m−3. Second, the fused PM2.5 fields estimated from observations achieved a good performance with R2=0.77 (RMSE = 14.29 µg m−3) and R2=0.84 (RMSE = 12.96 µg m−3) respectively evaluated at the stringent city level and station level. The generated reanalysis PM2.5 fields have complete spatial coverage within the modeling domain. One significant benefit of the fusion framework is that the model training does not rely on observations, which can be used to predict PM2.5 fields in newly set up observation networks such as those using portable sensors. Meanwhile, in the prediction procedure, only station observations are used along with supporting covariates. The fusion model has high computing efficiency (< 1 s d−1) due to acceleration using a graphical processing unit (GPU). As an alternative to generate chemical reanalysis fields, the method can be readily implemented in near-real time and be universally applied for other simulated variables with measurements available.

Highlights

  • Pollutant concentration fields with high accuracies are important for evaluating health effects, climate changes, and agricultural studies (Bell et al, 2007; Donkelaar et al, 2015; Gao et al, 2017)

  • The chemical transport model (CTM) simulations of PM2.5 concentrations have reasonable performance when evaluated against surface measurements, with root mean square error (RMSE) being 29.28– 31.08 μg m−3 and coefficient of determination (R2) being 0.31–0.42 (Fig. S1 in the Supplement)

  • The model was trained with the 1 d lead CTM simulations of PM2.5, relative humidity (RH), and wind speed (WS), together with geophysical covariates of digital elevation model (DEM) and land use and land cover (LULC)

Read more

Summary

Introduction

Pollutant concentration fields with high accuracies are important for evaluating health effects, climate changes, and agricultural studies (Bell et al, 2007; Donkelaar et al, 2015; Gao et al, 2017). Even though many datasets have been developed through deliberately designed statistical models, long-term observations, and extensive explanatory variables, there are scientific gaps in many circumstances following this paradigm to develop air pollutant fields These models usually rely on long-term and large-scale station observations for training, especially for complex time- and space-resolved models (Feng et al, 2020; Huang et al, 2021). In near-real-time operational data fusion applications, adjoint models need to be running simultaneously (Friberg et al, 2016), which is costly in computations To address these scientific gaps, this study proposes a new data fusion paradigm by designing a deep-learning-based model framework to estimate reanalysis from station observations by learning spatiotemporal correlations from deterministic CTM models. The model framework is fundamentally an alternative of generating chemical reanalysis fields but without rerunning CTMs with data assimilation

CTM simulations
Ground observations
Deep-learning data fusion framework
Model training
Model evaluation
Model parameters
Model performance for reproducing simulation fields
Model performance for generating reanalysis fields
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.