Air quality monitoring based on chemical and meteorological drivers: Application of a novel data filtering-based hybridized deep learning model

Mehdi Jamei,Mumtaz Ali,Anurag Malik,Masoud Karbasi,Ekta Sharma,Zaher Mundher Yaseen

doi:10.1016/j.jclepro.2022.134011

Abstract

Particulate matter (PM) or particle pollution include the tiny particles of dust and fly ash particles are expelled from coal-burning power plants. Coal combustion is an extremely prevalent source of air pollution, and resulting PM has substantial impacts on human health, especially in industrial zones. This paper aims to design a novel hybrid deep learning framework based on long short-term memory (LSTM) integrated with a two-stage data filtering technique to accurately predict the air quality indices (i.e., PM2.5 and PM10) in a chosen study region ‘Miles Airport, Queensland,’ that meets the needs of the coal seam gas industry in Australia. The data used to construct the novel hybrid two-stage data filtering technique based on LSTM comprising of six meteorological parameters (i.e., wind direction, wind speed, air temperature, relative humidity, solar radiation, and rainfall) and two environmental factors (i.e., ozone and total suspended particulate). In the first stage, two robust feature selection methods, namely, extreme gradient boosting (XGBoost) and the classification and regression tree (CART) approach, were adopted to explore the most significant predictors. Then in the second stage, the best subset regression (BSR) technique is utilized to determine the best subsets input combinations i.e., C1, C2, and C3 based on several particular metrics. The three BSR-based input combinations were employed in the LSTM model to estimate the PM2.5 and PM10. Furthermore, to validate the main hybrid framework, two advanced machine learning (ML) methods (i.e., LightGBM and ridge kernel regression (KRR)) and two traditional ML methods (i.e., Adaptive neuro-fuzzy inference system (ANFIS) and multilayer perceptron neural network (MLP)) were hybridized with the multi-level data filtering strategy by examining the optimal input combinations. Several statistical metrics, graphical tools, and diagnostic analyses evaluated the hybrid models. The outcomes of the PM2.5 simulation based on 2375 data samples showed that the LSTM-C3 containing all the selected predictors yielded the most promising accuracy, followed by the LightGBM-C3 and MLP-C3 models. On the other hand, the simulation of PM10 distribution demonstrated that the LSTM-C3 was superior to other models, followed by the KRR-C3 and LightGBM-C3 models.

Full Text