Abstract
The global environment has become more polluted due to the rapid development of industrial technology. However, the existing machine learning prediction methods of air quality fail to analyze the reasons for the change of air pollution concentration because most of the prediction methods take more focus on the model selection. Since the framework of recent deep learning is very flexible, the model may be deep and complex in order to fit the dataset. Therefore, overfitting problems may exist in a single deep neural network model when the number of weights in the deep neural network model is large. Besides, the learning rate of stochastic gradient descent (SGD) treats all parameters equally, resulting in local optimal solution. In this paper, the Pearson correlation coefficient is used to analyze the inherent correlation of PM2.5 and other auxiliary data such as meteorological data, season data, and time stamp data which are applied to cluster for enhancing the performance. Extracted features are helpful to build a deep ensemble network (EN) model which combines the recurrent neural network (RNN), long short-term memory (LSTM) network, and gated recurrent unit (GRU) network to predict the PM2.5 concentration of the next hour. The weights of the submodel change with the accuracy of them in the validation set, so the ensemble has generalization ability. The adaptive moment estimation (Adam) an algorithm for stochastic optimization is used to optimize the weights instead of SGD. In order to compare the overall performance of different algorithms, the mean absolute error (MAE) and mean absolute percentage error (MAPE) are used as accuracy metrics in the experiments of this study. The experiment results show that the proposed method achieves an accuracy rate (i.e., MAE=6.19 and MAPE=16.20%) and outperforms the comparative models.
Highlights
In recent years, the rapid development of the industry is accompanied by air pollution which causes the death of 7 million people every year and attracts great attention worldwide [1, 2]
The contributions of this study are summarized as follows: (i) This study proposes an ensemble model based on recurrent neural network (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU) to predict the PM2.5 concentration of the hour (ii) This study proposes a cluster method based on wind direction to improve prediction performance (iii) Wind direction has been proved to be related to PM2.5 concentration because the wind can carry or take away PM2.5
For the validation of the method, some models including the spatiotemporal deep learning (STDL) model, the time delay neural network (TDNN) model, the autoregressive moving average (ARMA) model, the support vector regression (SVR) model, and the traditional LSTM network [14] were used as the comparison algorithm and the results demonstrated the superiority of the proposed method
Summary
The rapid development of the industry is accompanied by air pollution which causes the death of 7 million people every year and attracts great attention worldwide [1, 2]. The main machine learning methods applied to air pollution are as follows: artificial neural network (ANN), ensemble learning, support vector machine (SVM), and other hybrid models [8]. These existing prediction machine learning methods of air quality lack analyzing the reasons for the change of air pollution concentration because most of the prediction methods take more focus on the model selection and ignore the reasons for changing. By analyzing the correlation between PM2.5 and other auxiliary data (an hour before), extracted air pollution characteristics are used to cluster the dataset and build a deep ensemble network (EN) model to predict PM2.5 concentration.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.