Abstract

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Highlights

  • Environmental research has long dealt with issues in exposure assessment, in studies involving air pollutants

  • We incorporate aerosol optical depth (AOD), land-use data, and meteorological data to predict PM2.5 levels on a 1 km × 1 km scale, from 1st January 2005 to 31st December 2013 in the Greater London area, using an ensemble model and four machine learning algorithms, which were calibrated using data derived from a wide network of monitors

  • Elevation data was obtained from the CGIAR Consortium for Spatial Information, who used Shuttle Radar Topography Mission (SRTM) data from the United States Geological Survey (USGS) and NASA

Read more

Summary

Introduction

Environmental research has long dealt with issues in exposure assessment, in studies involving air pollutants. Novel machine learning techniques allow us to create models with greater accuracy and flexibility that can combine remote sensing, land use, meteorological, and CTM inputs They are better at incorporating temporal variation than standard LURs. Machine learning algorithms allow us to non-parametrically examine the relationship between the predictors of pollutant concentrations and measured pollutant concentrations [28,31,32,33,34,35,36]. We incorporate AOD, land-use data, and meteorological data to predict PM2.5 levels on a 1 km × 1 km scale, from 1st January 2005 to 31st December 2013 in the Greater London area, using an ensemble model and four machine learning algorithms, which were calibrated using data derived from a wide network of monitors

Materials and Methods
Machine Learning Algorithms
Input Variables
Data Sources
Predictions
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call