Abstract

Imbalanced data classification has received much attention in machine learning, and many oversampling methods exist to solve this problem. However, these methods may suffer from insufficient noise filtering, overlap between synthetic and original samples, etc., resulting in degradation of classification performance. To this end, we propose a hybrid sampling with two-step noise filtering (HSNF) method in this paper, which consists of three modules. In the first module, HSNF denoises twice according to different noise discrimination mechanisms. Note that denoising mechanism is essentially based on the Euclidean distance between samples. Then in the second module, the minority class samples are divided into two categories, boundary samples and safe samples, respectively, and a portion of the boundary majority class samples are removed. In the third module, different oversampling methods are used to synthesize instances for boundary minority class samples and safe minority class samples. Experimental results on synthetic data and benchmark datasets demonstrate the effectiveness of HSNF in comparison with several popular methods. The code of HSNF will be released.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call