Adaptive SV-Borderline SMOTE-SVM algorithm for imbalanced data classification

Haiyan Wu,Weiguo Lin,Xiaolei Chen,Jiaqi Guo

doi:10.1016/j.asoc.2023.110986

Abstract

In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose an adaptive SV-Borderline SMOTE-SVM (Synthetic Minority Oversampling Technique-Support Vector Machine) algorithm, specifically designed to overcome the challenges associated with imbalanced data classification. The algorithm begins by mapping the dataset into the kernel space using SVM to identify the class boundary samples, known as support vectors (SVs). Subsequently, the neighbors of positive sample’s support vector (SV+) are calculated based on the kernel distance. Based on the class distribution of these neighbors, the SV+ samples are labeled as either “concave” or “convex”. Based on these labels, new samples are adaptively generated using two distinct calculation approaches for different labeled SV+ samples. To construct the SVM decision function without requiring the explicit expression of new samples in the kernel space, a Gram matrix is designed. Notably, all the processes ensure the credibility and reliability of the new samples. Additionally, the adaptive interpolation approach helps to ensure the security and diversity of new samples. Extensive experiments were conducted on a set of 50 KEEL datasets to evaluate the performance of our proposed method for imbalanced data classification. In experiments, our method achieved the highest G-mean score in 33 datasets and the highest F-values in 32 datasets. These results highlight the effectiveness and superiority of our proposed method compared to other approaches in addressing the challenges of imbalanced data classification.

Full Text