Air pollution is one of the most serious environmental issues faced by humans, and it affects the quality of life in cities. PM2.5 forecasting models can be used to create strategies for assessing and warning the public about anticipated harmful levels of air pollution. Accurate pollutant concentration measurements and forecasting are critical criteria for assessing air quality and are the foundation for making the right strategic decisions. Data-driven machine learning models for PM2.5 forecasting have gained attention in the recent past. In this study, PM2.5 prediction for Hyderabad city was carried out using various machine learning models viz. Multi-Linear Regression (MLR), decision tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), and XGBoost. A deep learning model, the Long Short-Term Memory (LSTM) model, was also used in this study. The results obtained were finally compared based on error and R2 value. The best model was selected based on its maximum R2 value and minimal error. The model's performance was further improved using the randomized search CV hyperparameter optimization technique. Spatio-temporal air quality analysis was initially conducted, and it was found that the average winter PM2.5 concentrations were 68% higher than the concentrations in summer. The analysis revealed that XGBoost regression was the best-performing machine learning model with an R2 value of 0.82 and a Mean Absolute Error (MAE) of 7.01 μg/ m3, whereas the LSTM deep learning model performed better than XGBoost regression for PM2.5 modeling with an R2 value of 0.89 and an MAE of 5.78 μg/ m3.
Read full abstract