Abstract
The problem of air pollution has become a global issue that has received attention from various countries. Jakarta, Indonesia's capital city, is unavoidable from the same problem. This study will use four parameters of substances PM10, SO2, CO, O3, and nitrogen dioxide to categorize Jakarta's air quality (NO2). The data used is daily data taken from the Air Quality Monitoring Station in Jakarta throughout 2020. The methods used include SVM, Random Forest, Logistic Regression, KNN, CART, and Stacking Algorithm. At the data preparation stage, we found missing values, outliers, and class imbalance problems. Before applying machine learning methods and evaluating accuracy, we used data pre-processing techniques such as the MissForest method, median substitution, and ADASYN. The results prove that the original dataset has a higher accuracy score (0.882 – 0.977) than the balanced dataset (0.829 – 0.976). According to the evaluation results, the Random Forest method has the highest accuracy score for original and balanced datasets. The overall result is better than the identical research, which produces 96.61% accuracy using a neural network. It shows that preprocessing steps such as missing values handling an imbalanced class handling is essential in classification performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: JOIV : International Journal on Informatics Visualization
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.