Air pollution is an urgent global environmental problem, with significant impacts on public health and ecosystem stability. This research aims to develop an air quality classification model using the Global Air Pollution dataset from Kaggle, which consists of 23,463 rows of data and 12 features, including important variables such as Air Quality Index (AQI), PM2.5, NO2, and O3. Decision Tree, Random Forest, and Support Vector Machine (SVM) algorithms are applied to perform classification, with a focus on hyperparameter tuning to increase model accuracy. The research results show that the Decision Tree provides the best results with an accuracy of 99.89% after tuning hyperparameters using the Grid Search method. The SVM model showed an improvement of 94.89% to 99.32%, while Random Forest recorded an accuracy of 96.87% with no significant improvement after tuning. Importance feature analysis identified PM2.5 and AQI as the dominant factors in influencing air quality, with PM2.5 having the highest importance value of 0.93. This research confirms that machine learning can be an effective tool for integrating and classifying air pollution. It is hoped that the integration of this model into a real-time air quality monitoring system can help make more responsive and precise decisions in dealing with air pollution problems.
Read full abstract