Abstract

Learning a classifier from imbalanced data continues to be a challenging issue. The oversampling methods can improve the imbalanced classification from the perspective of data preprocessing. Different oversampling methods have been proposed. Nevertheless, most tend to generate unnecessary noise, create redundant synthetic samples in the class center and heavily rely on the parameter k. To solve the above issues, this work presents an oversampling method based on local sets and SMOTE (LS-SMOTE). First, the local sets are searched to describe the local characteristic of imbalanced data. Second, a local sets-based noise filter is designed to remove noise and smooth the class boundary. Finally, on each local set, the interpolation of SMOTE between a base sample and a selected sample closest to the majority class is employed to create the synthetic samples. Experimental results with 12 real data sets have proved that LS-SMOTE outperforms representative oversampling methods in training k nearest neighbor classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call