Abstract

Synthetic Minority Oversampling Technique (SMOTE) is one of the most prevalent oversampling methods in imbalanced classification. Due to its intrinsic drawbacks of generating new samples blindly without distinguishing noisy samples, many variants of SMOTE have been developed to avoid selecting the label-noise samples as seed samples or to remove label-noise samples after being oversampled. However, these variants interpolate new samples linearly among the minority samples and their neighbors randomly without considering the relative chaotic level between each minority sample and its neighbors. In this paper, we propose a general weighting framework that carefully designates the interpolation location of each synthetic sample by computing the chaotic levels between the seed sample and its neighbors, placing it closer to a safe and clean sample and far away from a chaotic one. This general weighting framework can be easily combined with diversified SMOTE variants, thus we called it W-SMOTEs. Extensive experiments on synthetic, UCI, and industrial datasets with different levels of label noise demonstrate that the W-SMOTEs can effectively reduce the noisy samples produced and can enhance the separability between classes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call