Abstract

SummaryDust, or particulate matter (PM2.5), is among the most harmful pollutants negatively affecting human health. Predicting indoor PM2.5 concentrations is essential to achieve acceptable indoor air quality. This study aims to investigate data‐driven models to accurately predict PM 2.5 pollution. Notably, a comparative study has been conducted between twenty‐one machine learning and deep learning models to predict PM2.5 levels. Specifically, we investigate the performance of machine learning and deep learning models to predict ambient PM2.5 concentrations based on other ambient pollutants, including SO, NO, O, CO, and PM10. Here, we applied Bayesian optimization to optimally tune hyperparameters of the Gaussian process regression with different kernels and ensemble learning models (i.e., boosted trees and bagged trees) and investigated their prediction performance. Furthermore, to further enhance the forecasting performance of the investigated models, dynamic information has been incorporated by introducing lagged measurements in the construction of the considered models. Results show a significant improvement in the prediction performance when considering dynamic information from past data. Moreover, three methods, namely, random forest (RF), decision tree, and extreme gradient boosting, are applied to assess variables contribution and revealed that lagged PM2.5 data contribute significantly to the prediction performance and enables the construction of parsimonious models. Hourly concentration levels of ambient air pollution from the air quality monitoring network located in Seoul are employed to verify the prediction effectiveness of the studied models. Six measurements of effectiveness are used for assessing the prediction quality. Results showed that deep learning models are more efficient than the other investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost). Also, the results showed that the bidirectional long short term memory (BiLSTM) and bidirectional gated recurrent units (BiGRU) networks produce higher performance than the investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost) and deep learning models (i.e., LSTM, GRU, and convolutional neural network).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.