Abstract

Imbalanced classification is an important task in supervised learning, and Synthetic Minority Over-sampling Technique (SMOTE) is the most common method to address it. However, the performance of SMOTE deteriorates in the presence of label noise. Current generalizations of SMOTE try to tackle this problem by either selecting some samples in minority class as seed samples or combining SMOTE with a certain noise filter. Unfortunately, the former approach usually introduces extra parameters difficult to be optimized, and the latter one relies heavily on the performance of certain specific noise filter. In this paper, a self-adaptive robust SMOTE, called RSMOTE, is proposed for imbalanced classification with label noise. In RSMOTE, relative density has been introduced to measure the local density of every minority sample, and the non-noisy minority samples are divided into the borderline samples and safe samples adaptively basing their distinguishing characteristics of relative density. In addition, we reweigh the number that needs to be generated by every minority samples based on its chaotic level. Furthermore, we generate new samples within in the borderline area and safe area respectively to enhance the separability of the boundary. RSMOTE does not rely on any specific noise filter nor introduce any extra parameters. The experimental results demonstrate that the proposed approach performs better than the comparison methods in terms of several metrics, including Precision, Recall, Area Under the Curve (AUC), F1-measure, and G-mean. The implementation of the proposed RSMOTE in programming language Python is available at https://github.com/syxiaa/RSMOTE.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call