An oversampling method for multi-class imbalanced data based on composite weights.

Mingyang Deng,Fuwei Wu,Yingshi Guo,Chang Wang,Wajid Mumtaz

doi:10.1371/journal.pone.0259227

Mingyang Deng, Fuwei Wu + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0259227

Copy DOI

Journal: PloS one	Publication Date: Nov 12, 2021
Citations: 8	License type: CC BY 4.0

Affiliation: Changchun University of Technology, Chang'an University

Abstract

To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.

Highlights

Imbalanced data is one of the important problems to be solved in machine learning and data mining
Studies have shown that in the classification process of imbalanced data, the classification hyperplane boundary is shifted to the side of small samples due to the support of large sample size, and small samples are misclassified leading to low classification accuracy of imbalanced data
Through comparing the CI values of different algorithms, it is found that the classification oversampling method proposed in this paper does not show significant superiority in the composite indicator AUC compared with other algorithms, but the CI value of this algorithm is significantly higher than that of SMOTE, SVMOM and SMO+TLK algorithms, which indicates that this algorithm has good sampling functional capability for imbalanced data

Summary

Introduction

Imbalanced data is one of the important problems to be solved in machine learning and data mining. Imbalance data classification is widely used in data processing in the fields of social surveys, disaster prediction and disease prevention [1,2,3]. Studies have shown that in the classification process of imbalanced data, the classification hyperplane boundary is shifted to the side of small samples due to the support of large sample size, and small samples are misclassified leading to low classification accuracy of imbalanced data. In multi-class imbalanced data, the classification hyperplane is affected by the difference of data sizes of multi-class samples, which makes its classification accuracy unable to meet the needs of scientific computing. The classification of multi-class imbalanced data has become a key problem in data processing research [4].

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An oversampling method for multi-class imbalanced data based on composite weights.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Enhancing classification performance of multi-class imbalanced data using the OAA-DB algorithm
Piyasak Jeatrakul ... Kok Wai Wong
-
Piyasak Jeatrakul, et. al.Piyasak Jeatrakul ... Kok Wai Wong
01 Jun 2012
01 Jun 2012

A survey of multi-class imbalanced data classification methods
Meng Han ... Dongliang Mu
Journal of Intelligent & Fuzzy Systems | VOL. 44
Meng Han, et. al.Meng Han ... Dongliang Mu
30 Jan 2023
Journal of Intelligent & Fuzzy Systems | VOL. 44

Bagging Using Instance-Level Difficulty for Multi-Class Imbalanced Big Data Classification on Spark
William C Sleeman Iv ... Bartosz Krawczyk
-
William C Sleeman Iv, et. al.William C Sleeman Iv ... Bartosz Krawczyk
01 Dec 2019
01 Dec 2019

Multi-class imbalanced big data classification on Spark
William C Sleeman Iv ... Bartosz Krawczyk
Knowledge-Based Systems | VOL. 212
William C Sleeman Iv, et. al.William C Sleeman Iv ... Bartosz Krawczyk
07 Nov 2020
Knowledge-Based Systems | VOL. 212

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An oversampling method for multi-class imbalanced data based on composite weights.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one