Generating high spatial resolution exposure estimates from sparse regulatory monitoring data

Yihui Ge,Zhenchun Yang,Yan Lin,Philip K Hopke,Albert A Presto,Meng Wang,David Q Rich,Junfeng Zhang

doi:10.1016/j.atmosenv.2023.120076

Abstract

Random Forest algorithms have extensively been used to estimate ambient air pollutant concentrations. However, the accuracy of model-predicted estimates can suffer from extrapolation problems associated with limited measurement data to train the machine learning algorithms. In this study, we developed and evaluated two approaches, incorporating low-cost sensor data, that enhanced the extrapolating ability of random-forest models in areas with sparse monitoring data. Rochester, NY is the area of a pregnancy-cohort study. Daily PM2.5 concentrations from the NAMS/SLAMS sites were obtained and used as the response variable in the model, with satellite data, meteorological, and land-use variables included as predictors. To improve the base random-forest models, we used PM2.5 measurements from a pre-existing low-cost sensors network, and then conducted a two-step backward selection to gradually eliminate variables with potential emission heterogeneity from the base models. We then introduced the regression-enhanced random forest method into the model development. Finally, contemporaneous urinary 1-hydroxypyrene was used to evaluate the PM2.5 predictions generated from the two approaches. The two-step approach increased the average external validation R2 from 0.49 to 0.65, and decreased the RMSE from 3.56 μg/m3 to 2.96 μg/m3. For the regression-enhanced random forest models, the average R2 of the external validation was 0.54, and the RMSE was 3.40 μg/m3. We also observed significant and comparable relationships between urinary 1-hydroxypyrene levels and PM2.5 predictions from both improved models. This PM2.5 model estimation strategy could improve the extrapolating ability of random forest models in areas with sparse monitoring data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generating high spatial resolution exposure estimates from sparse regulatory monitoring data

Abstract

Talk to us

Similar Papers

More From: Atmospheric Environment

Lead the way for us

Journal: Atmospheric Environment	Publication Date: Sep 12, 2023
Citations: 6

Similar Papers

Comparative Analysis of Random Forest and XGBoost in Classifying Ionospheric Signal Disturbances During Solar Flares
Filip Arnaut ... Vladimir Srećković
-
Filip Arnaut, et. al.Filip Arnaut ... Vladimir Srećković
08 Mar 2024
08 Mar 2024

Prediction Model of Cardiac Risk for Dental Extraction in Elderly Patients with Cardiovascular Diseases
Min Tang ... Shao-Jun Ma
Gerontology | VOL. 65
Min Tang, et. al.Min Tang ... Shao-Jun Ma
02 May 2019
Gerontology | VOL. 65

Seeing the Forest for the Trees: Random Forest Models for Predicting Survival in Kidney Transplant Recipients.
Ruth Sapir-Pichhadze ... Bruce Kaplan
Transplantation | VOL. 104
Ruth Sapir-Pichhadze, et. al.Ruth Sapir-Pichhadze ... Bruce Kaplan
01 May 2020
Transplantation | VOL. 104

Development and external validation of a predictive model for prolonged length of hospital stay in elderly patients undergoing lumbar fusion surgery: comparison of three predictive models.
Shuai-Kang Wang ... Zhong-En Li
European spine journal : official publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society | VOL. 33
Shuai-Kang Wang, et. al.Shuai-Kang Wang ... Zhong-En Li
30 Jan 2024
30 Jan 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generating high spatial resolution exposure estimates from sparse regulatory monitoring data

Abstract

Talk to us

Similar Papers

More From: Atmospheric Environment