SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering

José A Sáez,Julián Luengo,Jerzy Stefanowski,Francisco Herrera

doi:10.1016/j.ins.2014.08.051

Abstract

Classification datasets often have an unequal class distribution among their examples. This problem is known as imbalanced classification. The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most well-know data pre-processing methods to cope with it and to balance the different number of examples of each class. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with other factors related to the distribution of the data. One of these is the presence of noisy and borderline examples, the latter lying in the areas surrounding class boundaries. Certain intrinsic limitations of SMOTE can aggravate the problem produced by these types of examples and current generalizations of SMOTE are not correctly adapted to their treatment.This paper proposes the extension of SMOTE through a new element, an iterative ensemble-based noise filter called Iterative-Partitioning Filter (IPF), which can overcome the problems produced by noisy and borderline examples in imbalanced datasets. This extension results in SMOTE–IPF. The properties of this proposal are discussed in a comprehensive experimental study. It is compared against a basic SMOTE and its most well-known generalizations. The experiments are carried out both on a set of synthetic datasets with different levels of noise and shapes of borderline examples as well as real-world datasets. Furthermore, the impact of introducing additional different types and levels of noise into these real-world data is studied. The results show that the new proposal performs better than existing SMOTE generalizations for all these different scenarios. The analysis of these results also helps to identify the characteristics of IPF which differentiate it from other filtering approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Sep 4, 2014
Citations: 510

Similar Papers

Sampling technique for noisy and borderline examples problem in imbalanced classification
Abhishek Dixit ... Ashish Mani
Applied Soft Computing | VOL. 142
Abhishek Dixit, et. al.Abhishek Dixit ... Ashish Mani
02 May 2023
Applied Soft Computing | VOL. 142

Comparative Multinomial Text Classification Analysis of Naïve Bayes and XGBoost with SMOTE on Imbalanced Dataset
Ashish Chaturvedi ... Santosh Yadav
-
Ashish Chaturvedi, et. al.Ashish Chaturvedi ... Santosh Yadav
05 Sep 2021
05 Sep 2021

Analysis of Student Graduation Prediction Using Machine Learning Techniques on an Imbalanced Dataset: An Approach to Address Class Imbalance
Dedy Hermanto ... Muhammad Rizky Pribadi
Scientific Journal of Informatics | VOL. 11
Dedy Hermanto, et. al.Dedy Hermanto ... Muhammad Rizky Pribadi
05 Aug 2024
Scientific Journal of Informatics | VOL. 11

A three-step combination strategy for addressing outliers and class imbalance in software defect prediction
Muhammad Rizky Pribadi ... Hendry Hendry
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 13
Muhammad Rizky Pribadi, et. al.Muhammad Rizky Pribadi ... Hendry Hendry
01 Sep 2024
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering

Abstract

Talk to us

Similar Papers

More From: Information Sciences