Abstract

Background/Aim: Exposure to fine particulate matter (PM2.5) is linked to adverse health outcomes. Usually, epidemiological studies rely on PM2.5 measurements collected from fixed monitors. However, in many countries such as Great Britain the existing monitoring network provides limited spatio-temporal coverage of PM2.5, with scarce data in small towns and rural areas. Data from satellites, climate and atmospheric reanalysis models, chemical transport models, and geospatial features offer additional information that can be used to reconstruct PM2.5 concentrations, filling the gaps in the ground monitoring network. The aim of this study is to develop and apply a multi-stage satellite-based machine learning (ML) model to estimate daily PM2.5 over a 1km2 grid across Great Britain for 2003-2018. Methods: We managed, processed, and synchronised data from several resources with different formats, projections, and spatio-temporal resolutions, collecting a dataset with more than 100 billion rows. We then applied a multi-stage random forest (RF) model to obtain daily modelled PM2.5 at 1km2. Stage-1 predicts PM2.5 concentrations in monitors with PM10 only records. Stage-2 imputes satellite aerosol optical depth missing due to cloudiness and bad retrievals. Stage-3 applies the RF algorithm to estimate PM2.5 concentrations using a combined dataset from Stage-1, Stage-2, and a list of spatiotemporally synchronised predictors. Stage-4 predicts daily PM2.5 using Stage-3 model across the whole Great Britain. Results/Discussion: The RF model performed well in all stages. Stage-1 obtained an R2=0.91 (Ntree=500/mtry=4). Stage-2 and Stage-3 obtained a mean overall R2 of 0.93 (Ntree=50/mtry=20) and 0.79 (Ntree=500/mtry=30), respectively. Stage-4 reconstructed approximately 1.5 billion PM2.5 values across Great Britain. Conclusion: The modelling tools and data developed in this project provide continuous estimations of PM2.5 at surface-level across Great Britain, which can then be linked with existing health databases. This will enable an accurate estimation of health risks and impacts linked to both short- and long-term exposures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call