An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem

Chao-Ran Wang,Xin-Hui Shao

doi:10.1109/access.2020.3047923

Abstract

Minority oversampling techniques have played a pivotal role in the field of imbalanced learning. While traditional oversampling algorithms can cause problems such as intra-class imbalance of samples, ignoring important information of boundary samples, and high similarity between new and old samples. Based on the situation, we proposed a new type of over-sampling method, BIRCH and Boundary Midpoint Centroid Synthetic Minority Over-Sampling Technique (BI-BMCSMOTE). First of all, the algorithm used the BIRCH clustering method to achieve quick cluster of the minority samples. After identifying and removing the noise, it marked the boundary minority samples in the label by probability. Secondly, it generated a density function for each sample cluster, calculated its density and sampling weight, performed midpoint composite sampling among the minority samples marked by probability and other minority samples in each cluster, and then calculated and analyzed the specific value of composite sampling to improve the accuracy of the model. According to the experimental results, the algorithm was proved to be valid.

Highlights

The imbalanced data [1] refers to the amount of one class or several classes of data in a dataset is far larger than that of the other classes
In response to the above problems, this paper proposes a Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) and Boundary Midpoint Centroid Synthetic Minority Over-Sampling Technique (BI-BMCSMOTE), which mainly includes four steps: BIRCH clustering, marking boundary minority samples according to probability, calculating cluster density and giving the weight of the sample
The BI-BMCSMOTE algorithm is executed in four steps: conduct BIRCH clustering through a single scan of dataset by applying a tree structure; calculate the number of samples in each cluster according to the cluster density; identify the boundary minority samples and mark them according to probability; synthesize new samples proportionally from the marked boundary minority sample and the normal sample

Summary

Introduction

The imbalanced data [1] refers to the amount of one class or several classes of data in a dataset is far larger than that of the other classes. The data mining approaches have been used to establish the models and make decisions. When it comes to the classification of the imbalanced data, the traditional classification model is not efficient. This is because the classification models drawn from the standard classifiers, such as logistic regression, support vector machine and decision tree, are not productive and distort some minority samples [2]; or because some exceptions are mistaken as noise, vice versa [3]. The issues brought about by the imbalanced data can be found in many areas of data mining, such as credit card fraud [4], medical diagnosis [5], network intrusion [6], oil leakage [7], etc

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Dec 30, 2020
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

IA-SUWO: An Improving Adaptive semi-unsupervised weighted oversampling for imbalanced classification problems
Jianan Wei ... Dong Huang
Knowledge-Based Systems | VOL. 203
Jianan Wei, et. al.Jianan Wei ... Dong Huang
10 Jun 2020
Knowledge-Based Systems | VOL. 203

LAD-SMOTE: A New Oversampling Method Based on Locally Adaptive Distance
Haoyang Wang ... He Huang
-
Haoyang Wang, et. al.Haoyang Wang ... He Huang
01 Nov 2018
01 Nov 2018

Globalized Multiple Balanced Subsets With Collaborative Learning for Imbalanced Data.
Zonghai Zhu ... Dongdong Li
IEEE transactions on cybernetics | VOL. 52
Zonghai Zhu, et. al.Zonghai Zhu ... Dongdong Li
01 Jul 2020
IEEE transactions on cybernetics | VOL. 52

Improving interpolation-based oversampling for imbalanced data learning
Tuanfei Zhu ... Yonghe Liu
Knowledge-Based Systems | VOL. 187
Tuanfei Zhu, et. al.Tuanfei Zhu ... Yonghe Liu
05 Jul 2019
Knowledge-Based Systems | VOL. 187

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access