Feature selection is one of the important data pre-processing techniques that are used to increase the performance of machine learning models, to build faster and more cost-effective algorithms, and to make it easier to interpret the predictions made by the models. The main objective of this research work is to investigate the influence features to predict particulate matter (PM10). This research uses 24-hour average pollutant concentration data of 36 air quality monitoring stations provided by Gandhinagar Smart City Development Limited (GSCDL), Gandhinagar, Gujarat. Important features were identified using five feature selection techniques (correlation, forward selection, backward elimination, Exhaustive Feature Selection (EFS), and feature importance derived using Random Forest Regressor). With selected features six regression algorithms (Multiple Linear Regression, Random Forest, Decision Tree, K-nearest Neighbour, XGBoost, and Support Vector Regressor) were trained to predict PM10. Further, the models were compared based on the Root Mean Square Error (RMSE) and Coefficient of determination (R2) parameters to identify the model with good performance. This proposed model can be utilized as an early warning system, providing air quality information to local authorities to develop air-quality improvement initiatives.
Read full abstract