Spatiotemporal modeling of PM10 via committee method with in-situ and large scale information: Coupling of machine learning and statistical methods

Yasaman Mohammadi,Omid Zandi,Mohsen Nasseri,Yousef Rashidi

doi:10.1016/j.uclim.2023.101494

Abstract

The aim of this paper is the spatiotemporal characterization of daily PM10 (1 km spatial resolution) using machine learning (ML) models (Random Forest (RF) and Gaussian Process Regression (GPR)) over Tehran as the most polluted city in Iran for policy-makers. The performances of the ML models (which are calibrated via large-scale and different spatially interpolated in situe information) were compared against a benchmark station-based interpolator named Inverse Distance Weighting (IDW) using various statistical metrics. Using hold-out training approach and 70% of the available data, the Kling-Gupta Efficiency (KGE) values of the validation (training) sites were achived 0.60(0.56), 0.61(0.72), and 0.68(0.6) for GPR, RF, and IDW, respectively. Based on the seasonal assessmentof the validation gauges, all models performed similarly well in spring and summer; however, the IDW and ML models had better accuracy in winter and autumn, respectively. Furthermore, the results of Correlated Triple Collocation (CTC) implied that ML-based techniques provided a more accurate spatial distribution over the computational grids. Based on the evaluation of the representated daily mean maps, the IDW produced “Bull's eyes” around the monitoring stations. Although IDW yielded reasonable site-based performance, the IDW method may not deliver a realistic estimation of the pollutant over the study region.

Full Text