China has experienced persistent fine particulate matter (PM2.5) pollution for the past few years, which adversely affects both physical and mental health. The availability of high-accuracy and full-coverage PM2.5 products will be of substantial value in formulating effective policies to combat and regulate PM2.5 pollution. Therefore, we have developed a similarity distance-based space-time random forest (SDSTRF) model to estimate daily PM2.5 concentrations over China by integrating surface measurements, satellite aerosol products, meteorological data, and auxiliary information. The proposed model not only accounts for spatial-temporal heterogeneity, but also uses the similarity distance to avoid errors caused by outliers. It has undergone rigorous validation through three different cross-validation (CV) approaches and has shown high and stable accuracy, particularly in the site-based CV with a coefficient of determination (R2) of 0.87, and a root mean square error (RMSE) of 10.68 μg/m3, along with a relative RMSE (rRMSE) of 27.48%. In addition, the leave-out data are predicted to determine if the SDSTRF model is accurate in its predictive power, which achieves an impressive site-based CV R2 of 0.80 and an RMSE of 12.89 μg/m3, along with an rRMSE of 33.01%. The results from a variety of validation approaches strongly indicate that the SDSTRF model can provide accurate estimation of PM2.5 concentrations at different time scales and outperform many other space-time models by incorporating similarity distance. The proposed model would be a promising application in air pollution studies with remote sensing.
Read full abstract