Entropy and improved k‐nearest neighbor search based under‐sampling (ENU) method to handle class overlap in imbalanced datasets

Anil Kumar,Rama Shankar Yadav,Dinesh Singh

doi:10.1002/cpe.7894

Abstract

SummaryMany real‐world application datasets such as medical diagnostics, fraud detection, biological classification, risk analysis and so forth are facing class imbalance and overlapping problems. It seriously affects the learning of the classification model on these datasets because minority instances are not visible to the learner in the overlapped region and the performance of learners is biased towards the majority. Undersampling‐based methods are the most commonly used techniques to handle the above‐mentioned problems. The major problem with these methods is excessive elimination and information loss, that is, unable to retain potential informative majority instances. We propose a novel entropy and neighborhood‐based undersampling (ENU) that removed only those majority instances from the overlapped region which are having less informativeness (entropy) score than the threshold entropy. Most of such existing methods improved sensitivity scores significantly but not in many other performance contexts. ENU first computes entropy and threshold score for majority instances and, a local density‐based improved KNN search is used to identify overlapped majority instances. To tackle the problem effectively ENU defined four improved KNN‐based procedures (ENUB, ENUT, ENUC, and ENUR) for effective undersampling. ENU outperformed in sensitivity, G‐mean, and F1‐score average ranking with reduced information loss as compared to the existing state‐of‐the‐art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Entropy and improved k‐nearest neighbor search based under‐sampling (ENU) method to handle class overlap in imbalanced datasets

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Journal: Concurrency and Computation: Practice and Experience	Publication Date: Aug 30, 2023
Citations: 7

Similar Papers

A new rule-based knowledge extraction approach for imbalanced datasets
Aouatef Mahani ... Ahmed Riadh Baba-Ali
Knowledge and Information Systems | VOL. 61
Aouatef Mahani, et. al.Aouatef Mahani ... Ahmed Riadh Baba-Ali
25 Jan 2019
Knowledge and Information Systems | VOL. 61

SkewBoost: An algorithm for classifying imbalanced datasets
Saumil Hukerikar ... Vahida Attar
-
Saumil Hukerikar, et. al.Saumil Hukerikar ... Vahida Attar
01 Sep 2011
01 Sep 2011

Learning from Imbalanced Data Using Ensemble Methods and Cluster-Based Undersampling
Parinaz Sobhani ... Stan Matwin
-
Parinaz Sobhani, et. al.Parinaz Sobhani ... Stan Matwin
01 Jan 2015
01 Jan 2015

C-SASO: A Clustering-Based Size-Adaptive Safer Oversampling Technique for Imbalanced SAR Ship Classification
Yongxu Li ... Xi Zhang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Yongxu Li, et. al.Yongxu Li ... Xi Zhang
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Entropy and improved k‐nearest neighbor search based under‐sampling (ENU) method to handle class overlap in imbalanced datasets

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience