Abstract

Within the real-life domain, theoretical and practical research on imbalanced classification has been a hot topic of interest for data mining and machine learning. Data level processing is always in a prominent position since it is independent of the classifier, and the synthetic minority oversampling technique (SMOTE) is an outstanding representative. However, SMOTE simply considers neighborhood information and generates new samples in a linear interpolation manner, resulting in the generation of incorrect samples. In this paper, we propose an oversampling method based on the relative density of weighted k-nearest neighbor samples and the local shadow samples of a random synthetic affine linear combination (R-WDLS), which fully utilizes the nearest-neighbor and intraclass standard deviation information of the minority class samples. First, the noise and outlier samples in the original data are filtered according to the sample density. Then, for the retained minority samples, new points are generated around them with Gaussian distribution to expand the diversity of the minority samples. Finally, multiple points are selected to generate new samples that satisfy the conditions. The proposed R-WDLS is validated through sufficient comparison experiments with seven oversampling methods under eight classifiers on 24 real datasets. Friedman and Nemenyi post-hoc statistical tests show that R-WDLS obtains consistency-optimal results under all evaluation metrics, which is unmatched by other methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call