Imbalanced data classification based on DB-SLSMOTE and random forest

Qi Han,Rui Yang,Mengjie Huang,Huiqing Wen,Shaozhi Chen,Zitong Wan

doi:10.1109/cac51589.2020.9326743

Abstract

The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.

Full Text