Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem.

Yang Lu,Yuan Yan Tang,Yiu-Ming Cheung

doi:10.1109/tnnls.2019.2944962

Abstract

Recent studies of imbalanced data classification have shown that the imbalance ratio (IR) is not the only cause of performance loss in a classifier, as other data factors, such as small disjuncts, noise, and overlapping, can also make the problem difficult. The relationship between the IR and other data factors has been demonstrated, but to the best of our knowledge, there is no measurement of the extent to which class imbalance influences the classification performance of imbalanced data. In addition, it is also unknown which data factor serves as the main barrier for classification in a data set. In this article, we focus on the Bayes optimal classifier and examine the influence of class imbalance from a theoretical perspective. We propose an instance measure called the Individual Bayes Imbalance Impact Index (IBI3) and a data measure called the Bayes Imbalance Impact Index (BI3). IBI3 and BI3 reflect the extent of influence using only the imbalance factor, in terms of each minority class sample and the whole data set, respectively. Therefore, IBI3 can be used as an instance complexity measure of imbalance and BI3 as a criterion to demonstrate the degree to which imbalance deteriorates the classification of a data set. We can, therefore, use BI3 to access whether it is worth using imbalance recovery methods, such as sampling or cost-sensitive methods, to recover the performance loss of a classifier. The experiments show that IBI3 is highly consistent with the increase of the prediction score obtained by the imbalance recovery methods and that BI3 is highly consistent with the improvement in the F1 score obtained by the imbalance recovery methods on both synthetic and real benchmark data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Nov 1, 2019
Citations: 86

Similar Papers

Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms
Jihoon Shin ... Yoonkyung Cha
Ecological Informatics | VOL. 61
Jihoon Shin, et. al.Jihoon Shin ... Yoonkyung Cha
09 Nov 2020
Ecological Informatics | VOL. 61

Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise
V García ... L Cleofas-Sánchez
-
V García, et. al.V García ... L Cleofas-Sánchez
01 Jan 2015
01 Jan 2015

Class-imbalanced classifiers for high-dimensional data
W.-J Lin ... J J Chen
Briefings in Bioinformatics | VOL. 14
W.-J Lin, et. al.W.-J Lin ... J J Chen
09 Mar 2012
Briefings in Bioinformatics | VOL. 14

Smooth Soft-Balance Discriminative Analysis for imbalanced data
Xinyue Wang ... Tieyong Zeng
Knowledge-Based Systems | VOL. 228
Xinyue Wang, et. al.Xinyue Wang ... Tieyong Zeng
22 Mar 2021
Knowledge-Based Systems | VOL. 228

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems