Abstract

The problem of class imbalance is prevalent in many real-world data sets, causing learning models to skew towards the majority class and resulting in biased performance. Data augmentation methods, such as the well-known Synthetic Minority Over-sampling Technique (SMOTE), are commonly employed to address class imbalance by generating synthetic samples. However, the generation mechanism of SMOTE is relatively constrained resulting in insufficient diversity in synthetic samples. To overcome this limitation, this paper expands the classical SMOTE and introduces a novel generalized version, namely Multi-vector Stochastic Exploration Oversampling (MSEO). It broadens the set of mapping synthetic samples, originally formed by the determined direction vectors and scaling vectors through the neighboring samples, to a collection obtained through mappings with random direction vectors and scaling vectors. This allows the generated samples to escape the original linear interpolation region, facilitating a more flexible exploration of the sample space. We extensively evaluated the method on various types of datasets, including artificially generated datasets, multi-class real-world datasets, and the engineering dataset. The results indicate that MSEO exhibits significant advantages in enhancing classification performance and promoting diversity in synthetic samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call