Air pollution is a serious challenge to humankind as it poses many health threats. It can be measured using the air quality index (AQI). Air pollution is the result of contamination of both outdoor and indoor environments. The AQI is being monitored by various institutions globally. The measured air quality data are kept mostly for public use. Using the previously calculated AQI values, the future values of AQI can be predicted, or the class/category value of the numeric value can be obtained. This forecast can be performed with more accuracy using supervised machine learning methods. In this study, multiple machine-learning approaches were used to classify PM2.5 values. The values for the pollutant PM2.5 were classified into different groups using machine learning algorithms such as logistic regression, support vector machines, random forest, extreme gradient boosting, and their grid search equivalents, along with the deep learning method multilayer perceptron. After performing multiclass classification using these algorithms, the parameters accuracy and per-class accuracy were used to compare the methods. As the dataset used was imbalanced, a SMOTE-based approach for balancing the dataset was used. Compared to all other classifiers that use the original dataset, the accuracy of the random forest multiclass classifier with SMOTE-based dataset balancing was found to provide better accuracy.
Read full abstract