ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection

Xinkai Yi,Wei Li,Qian Hu,Yingying Xu,Sujatha Krishnamoorthy,Zhenzhou Tang

doi:10.1007/s40747-021-00638-w

Xinkai Yi, Wei Li + Show 4 more

Open Access

https://doi.org/10.1007/s40747-021-00638-w

Copy DOI

Abstract

Oversampling is a promising preprocessing technique for imbalanced datasets which generates new minority instances to balance the dataset. However, improper generated minority instances, i.e., noise instances, may interfere the learning of the classifier and impact it negatively. Given this, in this paper, we propose a simple and effective oversampling approach known as ASN-SMOTE based on the k-nearest neighbors and the synthetic minority oversampling technology (SMOTE). ASN-SMOTE first filters noise in the minority class by determining whether the nearest neighbor of each minority instance belongs to the minority or majority class. After that, ASN-SMOTE uses the nearest majority instance of each minority instance to effectively perceive the decision boundary, inside which the qualified minority instances are selected adaptively for each minority instance by the proposed adaptive neighbor selection scheme to synthesize new minority instance. To substantiate the effectiveness, ASN-SMOTE has been applied to three different classifiers and comprehensive experiments have been conducted on 24 imbalanced benchmark datasets. ASN-SMOTE is also extensively compared with nine notable oversampling algorithms. The results show that ASN-SMOTE achieves the best results in the majority of datasets. The ASN-SMOTE implementation is available at: https://www.github.com/yixinkai123/ASN-SMOTE/.

Highlights

The problem of class-imbalanced data in machine learning may occur often, which means that the class distribution of data in binary or multi-class classification problems has a significant slant
The first rank is allocated to the optimum oversampling technique and the eighth rank is assigned to the worst performing technique
We find that the random oversampling method performs the most poorly on all three measures when k-nearest neighbors (KNN) was used as the classifier

Summary

Introduction

The problem of class-imbalanced data in machine learning may occur often, which means that the class distribution of data in binary or multi-class classification problems has a significant slant. Within the binary classification problem, the majority class involves a large number of instances, while only a few instances in the minority class [42]. Such problems often appear in practical applications in bank fraudulent transaction detection methods [36], credit risk assessment [34], text classification [41], biomedical diagnosis [2,59] and firewall intrusion detection [5]. Due to the universal existence of imbalanced datasets in practical applications and the difficulty for traditional classifiers to deal with them, learning from class-imbalanced data has attracted the attention of many prominent researchers over the last 20 years [31]. Many preprocessing methods have been put forward to deal with class imbalance of datasets

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Complex & Intelligent Systems	Publication Date: Jan 21, 2022
Citations: 21	License type: open-access

R Discovery Prime

R Discovery Prime

ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complex & Intelligent Systems

Lead the way for us

Similar Papers

Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets
Iman Nekooeimehr ... Susana K Lai-Yuen
Expert systems with applications | VOL. 46
Iman Nekooeimehr, et. al.Iman Nekooeimehr ... Susana K Lai-Yuen
06 Nov 2015
Expert systems with applications | VOL. 46

Learning from imbalanced data sets with boosting and data generation
Hongyu Guo ... Herna L Viktor
ACM SIGKDD Explorations Newsletter | VOL. 6
Hongyu Guo, et. al.Hongyu Guo ... Herna L Viktor
01 Jun 2004
ACM SIGKDD Explorations Newsletter | VOL. 6

SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data
Xinmin Tao ... Zhiting Fan
Knowledge Based Systems | VOL. 234
Xinmin Tao, et. al.Xinmin Tao ... Zhiting Fan
12 Oct 2021
Knowledge Based Systems | VOL. 234

CDBH: A clustering and density-based hybrid approach for imbalanced data classification
Behzad Mirzaei ... Hossein Nezamabadi-Pour
Expert systems with applications | VOL. 164
Behzad Mirzaei, et. al.Behzad Mirzaei ... Hossein Nezamabadi-Pour
28 Sep 2020
Expert systems with applications | VOL. 164

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complex &amp; Intelligent Systems

More From: Complex & Intelligent Systems