In view of the spatial and temporal gaps left by the ground monitoring network for fine particulate matter (PM2.5), remote sensing has been regarded as an effective monitoring alternative to provide data of the PM2.5 distribution at a high resolution. Satellite-derived aerosol optical depth (AOD) is a parameter that has a good correlation with the PM2.5 concentration. In this study, a modified deep learning model was established to estimate hourly PM2.5 concentrations in the Yangtze River Delta Urban Agglomeration (YRDUA) region based on 5-km AOD data from Himawari-8. The model incorporated a density-based spatial clustering of applications-with-noise (DBSCAN) cluster analysis and a deep neural network (DNN) (denoted as DBSCAN–DNN) to construct individualized DNNs for the retrieval of PM2.5. The DBSCAN algorithm was used to identify the outlying datasets by involving the relationship between AOD–NO2 and PM2.5. The cluster analysis divided the inputs into separated clusters with distinct pollution levels, which helped to build a reference for the construction of individualized DNNs. The results showed that the DBSCAN–DNN model could greatly improve the estimation accuracy of the PM2.5 concentration based on identical inputs when compared with the pure DNN model. The 10-fold cross-validation R-value was enhanced by over 30%, with the highest R-value reaching 0.94 when applied to Shanghai dataset of 2018. The root-mean-square prediction error (RMSE) was also reduced by over 30%. In addition, the model performed well in generating hourly spatial estimations, thereby showing more detailed information for the dynamic changes of the PM2.5 concentration during the day. Moreover, according to the comparisons of five regional estimation results, the model was proven to have a good applicability to deal with a differentiated data volume and data complexity. This study not only proposes a new method to achieve PM2.5 estimations with a higher accuracy and spatiotemporal resolution, but also provides a new perspective for exploiting the sophisticated correlations among large environmental datasets by introducing pre-clustering into the deep learning approach.
Read full abstract