Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models

Jiajun Gu,Bo Yang,Michael Brauer,K Max Zhang

doi:10.1016/j.atmosenv.2020.118125

Jiajun Gu, Bo Yang + Show 2 more

Open Access

https://doi.org/10.1016/j.atmosenv.2020.118125

Copy DOI

Abstract

Resolving spatial variability in ambient air pollutant and quantifying contributing factors are critical to human exposure assessment and effective pollution control. Data-driven techniques have been employed in air quality modeling due to their capability to capture the complex relationships in data as well as for the benefit of fast and easy implementation. In this study, we addressed two issues on model evaluation and interpretability by applying two common data-driven approaches, linear regression (LR) and random forest (RF) with potentially predictive land-use predictor variables to predict spatial variations of air pollution in an urban setting. The data came from the measurement of ambient nitrogen dioxide (NO2) concentrations in the Greater Vancouver Regional District in Canada. First, we showed that the model performance was sensitive to the division of training and test sets. Applying a limited number of hold-out validations or cross-validations and reporting the mean model metrics cannot capture the variability and fairly evaluate the model performance. We proposed repeated cross-validations (RCVs) as a reliable evaluation method that accounts for both mean and variance. Second, there is not a consistent approach to measure the importance of predictor variables and quantify their contributions among different types of data-driven models. Traditional approaches only reflect the relative importance among predictor variables in terms of predictive power without a quantification of contribution to the model output. We proposed to apply SHapley Additive exPlanations (SHAP), a Shapley-value-based explanation method based on the coalitional game theory, as a unifying framework to interpret and compare different types of data-driven methods. We showed that SHAP is capable of 1) calculating predictor variable’s contribution to each data point; 2) ranking the importance of predictor variables in terms of their contributions to the model output. The results indicated that different models may favor different predictor variables and result in different interpretability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Atmospheric Environment	Publication Date: Dec 3, 2020
Citations: 31	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models

Abstract

Talk to us

Similar Papers

More From: Atmospheric Environment

Lead the way for us

Similar Papers

Evaluating the importance of vertical environmental variables for albacore fishing grounds in tropical Atlantic Ocean using machine learning and Shapley additive explanations (SHAP) approach
Tianjiao Zhang ... Hengshou Sui
Fisheries Oceanography | VOL. -
Tianjiao Zhang, et. al.Tianjiao Zhang ... Hengshou Sui
09 Sep 2024
Fisheries Oceanography | VOL. -

Ecologists overestimate the importance of predictor variables in model averaging: a plea for cautious interpretations
Matthias Galipaud ... François‐Xavier Dechaume‐Moncharmont
Methods in Ecology and Evolution | VOL. 5
Matthias Galipaud, et. al.Matthias Galipaud ... François‐Xavier Dechaume‐Moncharmont
17 Sep 2014
Methods in Ecology and Evolution | VOL. 5

Predicting Driver Fatigue in Monotonous Automated Driving with Explanation using GPBoost and SHAP
Feng Zhou ... Louis Tijerina
International Journal of Human–Computer Interaction | VOL. 38
Feng Zhou, et. al.Feng Zhou ... Louis Tijerina
27 Aug 2021
International Journal of Human–Computer Interaction | VOL. 38

Interpretable Machine Learning in Damage Detection Using Shapley Additive Explanations
Artur Movsessian ... David García Cava
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering | VOL. 8
Artur Movsessian, et. al.Artur Movsessian ... David García Cava
18 Jan 2022
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models

Abstract

Talk to us

Similar Papers

More From: Atmospheric Environment