Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Scarlet Stadtler,Clara Betancourt,Ribana Roscher

doi:10.3390/make4010008

Abstract

Air quality is relevant to society because it poses environmental risks to humans and nature. We use explainable machine learning in air quality research by analyzing model predictions in relation to the underlying training data. The data originate from worldwide ozone observations, paired with geospatial data. We use two different architectures: a neural network and a random forest trained on various geospatial data to predict multi-year averages of the air pollutant ozone. To understand how both models function, we explain how they represent the training data and derive their predictions. By focusing on inaccurate predictions and explaining why these predictions fail, we can (i) identify underrepresented samples, (ii) flag unexpected inaccurate predictions, and (iii) point to training samples irrelevant for predictions on the test set. Based on the underrepresented samples, we suggest where to build new measurement stations. We also show which training samples do not substantially contribute to the model performance. This study demonstrates the application of explainable machine learning beyond simply explaining the trained model.

Highlights

The topographyrelated features are of relevance to the neural network. Both models attribute some importance to the forest in the surrounding 25 km area, while for the neural network, this feature is two times as important as it is for the random forest
It is a somewhat stronger assumption for the neural network, where we cannot verify if the training stations we identified as k-nearest neighbors in the representation space are the stations on which the prediction on the test sample is based
We present various ways of using explainable machine learning to understand the core functionality of different machine learning models to support our understanding of the underlying dataset

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Air pollution poses a significant environmental risk to human health, leading to. Air quality monitoring networks are established in many countries to warn the public, monitor compliance to regulations concerning air pollutant emissions, and analyze observations to assist with the development of new regulations [2,3]. Tropospheric ozone is a toxic air pollutant. In contrast to stratospheric ozone, which protects humans and plants from harmful ultraviolet radiation, tropospheric, near-surface ozone harms humans and plants. It is a greenhouse gas [4]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning and Knowledge Extraction	Publication Date: Feb 11, 2022
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction

Lead the way for us

Similar Papers

Reply on RC2
Ranjeet Sokhi
-
Ranjeet SokhiRanjeet Sokhi
17 Dec 2021
17 Dec 2021

Reply on RC1
Ranjeet Sokhi
-
Ranjeet SokhiRanjeet Sokhi
17 Dec 2021
17 Dec 2021

Reply on RC1
Ranjeet Sokhi
-
Ranjeet SokhiRanjeet Sokhi
17 Dec 2021
17 Dec 2021

Advances in air quality research – current and emerging challenges
Ranjeet S Sokhi ...
Atmospheric Chemistry and Physics | VOL. 22
Ranjeet S Sokhi, et. al.Ranjeet S Sokhi ...
11 Apr 2022
Atmospheric Chemistry and Physics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction