Classification of Air Pollutant Index on Data with Outliers and Imbalance Class Problem

Dwi Krisbiantoro,Uswatun Hasanah,Retno Waluyo,Sarmini Sarmini,Irfan Pratama

doi:10.62527/joiv.8.3.1993

Abstract

The problem of air pollution has become a global issue that has received attention from various countries. Jakarta, Indonesia's capital city, is unavoidable from the same problem. This study will use four parameters of substances PM10, SO2, CO, O3, and nitrogen dioxide to categorize Jakarta's air quality (NO2). The data used is daily data taken from the Air Quality Monitoring Station in Jakarta throughout 2020. The methods used include SVM, Random Forest, Logistic Regression, KNN, CART, and Stacking Algorithm. At the data preparation stage, we found missing values, outliers, and class imbalance problems. Before applying machine learning methods and evaluating accuracy, we used data pre-processing techniques such as the MissForest method, median substitution, and ADASYN. The results prove that the original dataset has a higher accuracy score (0.882 – 0.977) than the balanced dataset (0.829 – 0.976). According to the evaluation results, the Random Forest method has the highest accuracy score for original and balanced datasets. The overall result is better than the identical research, which produces 96.61% accuracy using a neural network. It shows that preprocessing steps such as missing values handling an imbalanced class handling is essential in classification performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classification of Air Pollutant Index on Data with Outliers and Imbalance Class Problem

Abstract

Talk to us

Similar Papers

More From: JOIV : International Journal on Informatics Visualization

Lead the way for us

Journal: JOIV : International Journal on Informatics Visualization	Publication Date: Sep 30, 2024
License type: CC BY-SA 4.0

Similar Papers

Machine learning-based photometric classification of galaxies, quasars, emission-line galaxies, and stars
Fatemeh Zahra Zeraatgari ... Amin Mosallanezhad
Monthly Notices of the Royal Astronomical Society | VOL. 527
Fatemeh Zahra Zeraatgari, et. al.Fatemeh Zahra Zeraatgari ... Amin Mosallanezhad
08 Nov 2023
Monthly Notices of the Royal Astronomical Society | VOL. 527

Abstract P530: Building a Heart Disease Detection Web Application
Joana Tome ... Logan Cowan
Circulation | VOL. 147
Joana Tome, et. al.Joana Tome ... Logan Cowan
28 Feb 2023
Circulation | VOL. 147

Bagging and random forest classification methods for unbalanced data school dropout cases in Lampung province
Dhery Setiawan ... Hari Wijayanto
-
Dhery Setiawan, et. al.Dhery Setiawan ... Hari Wijayanto
01 Jan 2021
01 Jan 2021

Commentary: To classify means to choose a threshold
Jiangnan Lyu ... Hemant Ishwaran
The Journal of Thoracic and Cardiovascular Surgery | VOL. 165
Jiangnan Lyu, et. al.Jiangnan Lyu ... Hemant Ishwaran
08 Aug 2021
The Journal of Thoracic and Cardiovascular Surgery | VOL. 165

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification of Air Pollutant Index on Data with Outliers and Imbalance Class Problem

Abstract

Talk to us

Similar Papers

More From: JOIV : International Journal on Informatics Visualization