SYNTHETIC OVERSAMPLING OF INSTANCES USING CLUSTERING

Atlántida I Sánchez,Jesus A Gonzalez,Eduardo F Morales

doi:10.1142/s0218213013500085

Abstract

Imbalanced data sets in the class distribution is common to many real world applications. As many classifiers tend to degrade their performance over the minority class, several approaches have been proposed to deal with this problem. In this paper, we propose two new cluster-based oversampling methods, SOI-C and SOI-CJ. The proposed methods create clusters from the minority class instances and generate synthetic instances inside those clusters. In contrast with other oversampling methods, the proposed approaches avoid creating new instances in majority class regions. They are more robust to noisy examples (the number of new instances generated per cluster is proportional to the cluster's size). The clusters are automatically generated. Our new methods do not need tuning parameters, and they can deal both with numerical and nominal attributes. The two methods were tested with twenty artificial datasets and twenty three datasets from the UCI Machine Learning repository. For our experiments, we used six classifiers and results were evaluated with recall, precision, F-measure, and AUC measures, which are more suitable for class imbalanced datasets. We performed ANOVA and paired t-tests to show that the proposed methods are competitive and in many cases significantly better than the rest of the oversampling methods used during the comparison.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SYNTHETIC OVERSAMPLING OF INSTANCES USING CLUSTERING

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools

Lead the way for us

Journal: International Journal on Artificial Intelligence Tools	Publication Date: Apr 1, 2013
Citations: 22

Similar Papers

Mega trend diffusion-siamese network oversampling for imbalanced datasets’ SVM classification
Liang-Sian Lin ... Yi-Ting Chen
Applied Soft Computing | VOL. 143
Liang-Sian Lin, et. al.Liang-Sian Lin ... Yi-Ting Chen
12 May 2023
Applied Soft Computing | VOL. 143

Borderline over-sampling for imbalanced data classification
Hien M Nguyen ... Eric W Cooper
International Journal of Knowledge Engineering and Soft Data Paradigms | VOL. 3
Hien M Nguyen, et. al.Hien M Nguyen ... Eric W Cooper
01 Jan 2010
International Journal of Knowledge Engineering and Soft Data Paradigms | VOL. 3

A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets
Jin Wei Zhang ... Yi Lu
Advanced Materials Research | VOL. 271-273
Jin Wei Zhang, et. al.Jin Wei Zhang ... Yi Lu
01 Jul 2011
Advanced Materials Research | VOL. 271-273

Diversity analysis on imbalanced data sets by using ensemble models
Shuo Wang ... Xin Yao
-
Shuo Wang, et. al.Shuo Wang ... Xin Yao
01 Mar 2009
01 Mar 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SYNTHETIC OVERSAMPLING OF INSTANCES USING CLUSTERING

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools