Air Pollutant Concentration Forecasting Using Long Short-Term Memory Based on Wavelet Transform and Information Gain: A Case Study of Beijing.

Bingchun Liu,Mingzhao Lai,Xiaoling Guo,Qingshan Wang,Cornelio Yáñez-Márquez

doi:10.1155/2020/8834699

Abstract

Air pollutant concentration forecasting is an effective way which protects health of the public by the warning of the harmful air contaminants. In this study, a hybrid prediction model has been established by using information gain, wavelet decomposition transform technique, and LSTM neural network, and applied to the daily concentration prediction of atmospheric pollutants (PM2.5, PM10, SO2, NO2, O3, and CO) in Beijing. First, the collected raw data are selected by feature selection by information gain, and a set of factors having a strong correlation with the prediction is obtained. Then, the historical time series of the daily air pollutant concentration is decomposed into different frequencies by using a wavelet decomposition transform and recombined into a high-dimensional training data set. Finally, the LSTM prediction model is trained with high-dimensional data sets, and the parameters are adjusted by repeated tests to obtain the optimal prediction model. The data used in this study were derived from six air pollution concentration data in Beijing from 1/1/2014 to 31/12/2016, and the atmospheric pollutant concentration data of Beijing between 1/1/2017 and 31/12/2017 were used to test the predictive ability of the data set test model. The results show that the evaluation index MAPE of the model prediction is 7.45%. Therefore, the hybrid prediction model has a higher value of application for atmospheric pollutant concentration prediction, because this model has higher prediction accuracy and stability for future air pollutant concentration prediction.

Highlights

Air pollutant concentration forecasting is an effective way which protects health of the public by the warning of the harmful air contaminants
The long- and shortterm neural network (LSTM) prediction model is trained with high-dimensional data sets, and the parameters are adjusted by repeated tests to obtain the optimal prediction model. e data used in this study were derived from six air pollution concentration data in Beijing from 1/1/2014 to 31/12/2016, and the atmospheric pollutant concentration data of Beijing between 1/1/2017 and 31/12/2017 were used to test the predictive ability of the data set test model. e results show that the evaluation index Mean absolute percentage error (MAPE) of the model prediction is 7.45%. erefore, the hybrid prediction model has a higher value of application for atmospheric pollutant concentration prediction, because this model has higher prediction accuracy and stability for future air pollutant concentration prediction
The prediction of significant air pollution concentrations can be used as a policy tool for the environmental protection department to regulate social and economic activities such as transportation, industry, and urban construction under extreme air pollution conditions [3]. erefore, in order to support the decision of environmental management and avoid serious accidents caused by air pollution, it is urgent to establish a precise and stable pollutant concentration prediction model, which can predict the concentration of air pollutants in the future, helping the government to publish control measures for air pollutants and public health protection work

Summary

Introduction

Air pollutant concentration forecasting is an effective way which protects health of the public by the warning of the harmful air contaminants. A hybrid prediction model has been established by using information gain, wavelet decomposition transform technique, and LSTM neural network, and applied to the daily concentration prediction of atmospheric pollutants (PM2.5, PM10, SO2, NO2, O3, and CO) in Beijing. E neural network model predicts better results than the linear regression model based on the same input parameters, and it can fully capture the temporal variation of air pollutant concentration in a specific scene. These models usually have a common defect; that is, the ability to predict the concentration of particulate matter such as PM2.5 and PM10 decreases at very high concentrations [5]. LSTM has been successfully applied to image classification, natural language processing, human motion recognition, robot intelligence development, and oil price forecasting [17,18,19]. erefore, based on the ability of LSTM to analyze and predict spatiotemporal data, this study applies it to the prediction of air pollution and can obtain good performance

Methods

Results

Conclusion