Predicting PM2.5 in the Northeast China Heavy Industrial Zone: A Semi-Supervised Learning with Spatiotemporal Features

Hongxun Jiang,Caihong Sun,Xiaotong Wang

doi:10.3390/atmos13111744

Hongxun Jiang, Caihong Sun + Show 1 more

Open Access

https://doi.org/10.3390/atmos13111744

Copy DOI

Journal: Atmosphere	Publication Date: Oct 23, 2022
Citations: 2	License type: CC BY 4.0

Affiliation: Renmin University of China

Abstract

Particulate matter PM2.5 pollution affects the Chinese population, particularly in cities such as Shenyang in northeastern China, which occupies a number of traditional heavy industries. This paper proposes a semi-supervised learning model used for predicting PM2.5 concentrations. The model incorporates rich data from the real world, including 11 air quality monitoring stations in Shenyang and nearby cities. There are three types of data: air monitoring, meteorological data, and spatiotemporal information (such as the spatiotemporal effects of PM2.5 emissions and diffusion across different geographical regions). The model consists of two classifiers: genetic programming (GP) to forecast PM2.5 concentrations and support vector classification (SVC) to predict trends. The experimental results show that the proposed model performs better than baseline models in accuracy, including 3% to 18% over a classic multivariate linear regression (MLR), 1% to 11% over a multi-layer perceptron neural network (MLP-ANN), and 21% to 68% over a support vector regression (SVR). Furthermore, the proposed GP approach provides an intuitive contribution analysis of factors for PM2.5 concentrations. The data of backtracking points adjacent to other monitoring stations are critical in forecasting shorter time intervals (1 h). Wind speeds are more important in longer intervals (6 and 24 h).

Full Text