AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE

Xin Wang,Xin Wang,Qin Qin,Yue Yang,Qin Wang,Huijiao Wang,Hua Jiang,Mingsong Chen,Mingsong Chen

doi:10.1155/2020/8837357

Xin Wang, Xin Wang + Show 7 more

Open Access

https://doi.org/10.1155/2020/8837357

Copy DOI

Abstract

Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in the sampling process. The combination of AGNES-SMOTE and SVM (Support Vector Machine) is presented to deal with imbalanced datasets classification. Experiments on UCI datasets are conducted to compare the performance of different algorithms mentioned in the literature. Experimental results indicate AGNES-SMOTE excels in synthesizing new samples and improves SVM classification performance on imbalanced datasets.

Highlights

Imbalanced dataset is featured with having fewer instances of some classes than others in a dataset
The existing oversampling algorithms mainly deal with between-class imbalance and neglect within-class imbalance
Some problems are ignored, such as samples being oversampled are not selected, noise is not removed, synthetic samples will overlap, and samples will be distributed “marginally.” To solve the abovementioned problems, an oversampling algorithm—AGNES-SMOTE—is presented in this paper, which is based on the hierarchical clustering and improved SMOTE. is algorithm follows the following procedures: filter noise samples in the dataset; cluster majority samples and minority samples through the AGNES algorithm, respectively; divide minority subclusters in the light of the obtained majority subclusters; select samples for oversampling based on sampling weight and the probability distribution of minority subclusters; restrict the generation of new samples in a certain area by the centroid method

Summary

Introduction

Imbalanced dataset is featured with having fewer instances of some classes than others in a dataset. Compared with Cluster-SMOTE, K-means-SMOTE clustered the entire datasets, found the overlap and avoided oversampling in unsafe areas, restricted the synthetic samples in the target area, and eliminated within-class and between-class imbalances It avoided noise samples and attained good results. Its procedures are listed as follows: filter noise samples, adopt the AGNES algorithm to cluster minority samples, form the minority subclusters iteratively, and consider the majority samples distribution during the merging process to avoid generating overlapping synthetic samples. Repeat this operation until the distance between the two closest minority subclusters exceeds the set valve-value. AGNES-SMOTE attains a better result in the experiment

Preliminary Theory

Improved SMOTE Algorithm

Experimental Design and Result Analysis

Experimental Analysis

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Sep 23, 2020
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Classification of imbalanced data sets using Multi Objective Genetic Programming
Hardik H Maheta ... Vipul K Dabhi
-
Hardik H Maheta, et. al.Hardik H Maheta ... Vipul K Dabhi
01 Jan 2015
01 Jan 2015

Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets
Sachin Patil ... Shefali Sonavane
-
Sachin Patil, et. al.Sachin Patil ... Shefali Sonavane
17 Oct 2019
17 Oct 2019

A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)
Pelin Akın
AIMS Mathematics | VOL. 8
Pelin AkınPelin Akın
01 Jan 2023
AIMS Mathematics | VOL. 8

SMOTE–ENN-Based Data Sampling and Improved Dynamic Ensemble Selection for Imbalanced Medical Data Classification
Mouna Lamari ... Nacer Eddine Benzebouchi
-
Mouna Lamari, et. al.Mouna Lamari ... Nacer Eddine Benzebouchi
20 Oct 2020
20 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming