Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Wahyu Nugraha,Muhammad Sony Maulana,Agung Sasongko

doi:10.1088/1742-6596/1641/1/012014

Wahyu Nugraha, Muhammad Sony Maulana + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1641/1/012014

Copy DOI

Abstract

Machine Learning is very difficult to make an effective learning model if the distribution of classes in the training data set that is used is not balanced. The problem of class imbalance is mostly found during classifications in the real world where one class is very small in number (minority class) while the other classes are very numerous (majority in class). Building a learning algorithm model without considering the problem of class imbalance causes the learning model to be flooded by majority class instances so that it ignores minority class predictions. Random undersampling and oversampling techniques have been widely used in various studies to overcome class imbalances. In this study using the undersampling strategy with clustering techniques while the classification model uses C4.5. Clustering is used to group data and the undersampling process is performed on each data group. The goal is that sample samples that are useful are not eliminated. Statistical test results from experiments using 10 imbalance datasets from KEEL-repository dan Kaggle dataset with various sample sizes indicate that clustering-based undersampling produces satisfactory performance. Improved performance can be seen from the sensitivity and AUC values that increased significantly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Nov 1, 2020
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data
Mahmudul Hasan Popel ... Khan Md Hasib
-
Mahmudul Hasan Popel, et. al.Mahmudul Hasan Popel ... Khan Md Hasib
01 Dec 2018
01 Dec 2018

UnderBagging based reduced Kernelized weighted extreme learning machine for class imbalance learning
Bhagat Singh Raghuwanshi ... Sanyam Shukla
Engineering Applications of Artificial Intelligence | VOL. 74
Bhagat Singh Raghuwanshi, et. al.Bhagat Singh Raghuwanshi ... Sanyam Shukla
07 Jul 2018
Engineering Applications of Artificial Intelligence | VOL. 74

Effects of Class Imbalance Using Machine Learning Algorithms
Swati V Narwane ... Sudhir D Sawarkar
International Journal of Applied Evolutionary Computation | VOL. 12
Swati V Narwane, et. al.Swati V Narwane ... Sudhir D Sawarkar
01 Jan 2020
International Journal of Applied Evolutionary Computation | VOL. 12

Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems
Julio Cesar Munguía Mondragón ... Eréndira Rendón Lara
Mathematics | VOL. 11
Julio Cesar Munguía Mondragón, et. al.Julio Cesar Munguía Mondragón ... Eréndira Rendón Lara
21 Sep 2023
Mathematics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series