Abstract
Imbalanced data classification has received much attention in machine learning, and many oversampling methods exist to solve this problem. However, these methods may suffer from insufficient noise filtering, overlap between synthetic and original samples, etc., resulting in degradation of classification performance. To this end, we propose a hybrid sampling with two-step noise filtering (HSNF) method in this paper, which consists of three modules. In the first module, HSNF denoises twice according to different noise discrimination mechanisms. Note that denoising mechanism is essentially based on the Euclidean distance between samples. Then in the second module, the minority class samples are divided into two categories, boundary samples and safe samples, respectively, and a portion of the boundary majority class samples are removed. In the third module, different oversampling methods are used to synthesize instances for boundary minority class samples and safe minority class samples. Experimental results on synthetic data and benchmark datasets demonstrate the effectiveness of HSNF in comparison with several popular methods. The code of HSNF will be released.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.