Review of Classification Methods on Unbalanced Data Sets

Le Wang,Xiaojuan Li,Haodong Cheng,Ni Zhang,Meng Han

doi:10.1109/access.2021.3074243

Le Wang, Xiaojuan Li + Show 3 more

Open Access

https://doi.org/10.1109/access.2021.3074243

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 176	License type: CC BY 4.0

Affiliation: North Minzu University

Abstract

This paper studies the classification of unbalanced data sets. First, this kind of data sets is briefly introduced, and then the classification methods of unbalanced data sets are analyzed in detail from different perspectives such as data sampling method, algorithm level, feature level, cost-sensitive function, and deep learning. In addition, the data sampling methods are divided into different technologies for introduction: unbalanced data set classification method based on synthetic minority over-sampling technology (SMOTE), support vector machine (SVM) technology, and k-nearest neighbor (KNN) technology, etc. Then, the advantages and disadvantages of these methods are compared. Finally, the evaluation criteria of the unbalanced data set classifier are summarized, and the future work directions are prospected and summarized.

Highlights

Over time, the data tends to change its characteristics, since the number of learning instances in the considered class is not equal, this distribution causes some difficulties in classifying the data sets
The true positive rate and false positive rate of confounding matrix, ROC curve, G-means, and other methods are usually used in the classification of uneven data to evaluate the performance of classifiers, because they can better measure the effect of classifiers based on the characteristics of unbalanced data sets
The classification of unbalanced data sets is of great significance in data mining, because the unbalanced data sets are very common in real life, and its problems are becoming more and more obvious

Summary

INTRODUCTION

The data tends to change its characteristics, since the number of learning instances in the considered class is not equal, this distribution causes some difficulties in classifying the data sets. The characteristic of unbalanced data sets is that the instances that are concerned when mining data sets are often minority class, but the number of the class is small. The classification algorithm of unbalanced data sets using the sampling method will be summarized according to the types of techniques used, at the end of this chapter, which is more clear than previous reviews. (1) This paper summarizes and analyzes the classification methods for unbalanced data sets in detail from the aspects of data sampling, algorithm level, feature level and, deep learning methods. (2) In the sampling methods, this paper summarizes the classification methods for the unbalanced data sets from three aspects, synthetic minority over-sampling technique (SMOTE), support vector machine (SVM), and k-nearest neighbor (KNN) in this review than the previous. The external method mentioned is the classification method based on the data sampling technique introduced in this chapter, and the internal method for creating or modifying the algorithm is described in detail in the chapter

UNBALANCED DATA SETS CLASSIFICATION METHOD BASED ON SAMPLING METHOD

SAMPLING METHOD BASED ON SVM

OTHER UNBALANCED DATA SET CLASSIFICATION METHODS

CLASSIFICATION ALGORITHM OF UNBALANCED DATA SETS AT FEATURE LEVEL

EVALUATION CRITERION OF CLASSFIER

FUTURE WORK

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Review of Classification Methods on Unbalanced Data Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Classification Method of Educational Discourse Power Imbalance Data Set Based on Mixed Big Data Analysis
Jinzhi Teng
-
Jinzhi TengJinzhi Teng
01 Jan 2023
01 Jan 2023

Unbalanced Data Set Classification Based on Convolutional Neural Network
Hui Xiong
-
Hui XiongHui Xiong
01 Sep 2021
01 Sep 2021

An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases
T Sobha Rani ... P V Soujanya
-
T Sobha Rani, et. al.T Sobha Rani ... P V Soujanya
01 Aug 2013
01 Aug 2013

Acoustic emotion recognition based on fusion of multiple feature-dependent deep Boltzmann machines
Kelvin Poon-Feng ... Haizhou Li
-
Kelvin Poon-Feng, et. al.Kelvin Poon-Feng ... Haizhou Li
01 Sep 2014
01 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Review of Classification Methods on Unbalanced Data Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access