Predicting PM2.5 Concentrations Across USA Using Machine Learning

P Preetham Vignesh,Jonathan H Jiang,P Kishore

doi:10.1029/2023ea002911

Abstract

AbstractEconomic growth, air pollution, and forest fires in some states in the United States have increased the concentration of particulate matter with a diameter less than or equal to 2.5 μm (PM2.5). Although previous studies have tried to observe PM2.5 both spatially and temporally using aerosol remote sensing and geostatistical estimation, they were limited in accuracy by coarse resolution. In this paper, the performance of machine learning models on predicting PM2.5 is assessed with linear regression (LR), decision tree (DT), gradient boosting regression (GBR), AdaBoost regression (ABR), XGBoost (XGB), k‐nearest neighbors (K‐NN), long short‐term memory (LSTM), random forest (RF), and support vector machine (SVM) using PM2.5 station data from 2017 to 2021. To compare the accuracy of all the nine machine learning models, the coefficient of determination (R2), root mean square error (RMSE), Nash‐Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models, the RF (100 decision trees with a max depth of 20) and support vector regression (SVR; nonlinear kernel, degree 3 polynomial) models were the best for predicting PM2.5 concentrations. Additionally, comparison of the PM2.5 performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States.

Full Text