Abstract

Abstract The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by generating new minority class samples, which are faced with the problems of selecting the appropriate area for generating samples, fuzzy classification boundary and uneven distribution of samples. To solve these problems, we propose a novel oversampling algorithm named space partitioning adaptive weighted synthetic minority oversampling technique (SPAW-SMOTE). We first divide the data space into boundary space and non-boundary space based on spatial partitioning techniques. The number of samples to be generated is assigned to different spaces by the designed adaptive weighting algorithm, which is used to solve the problems of uneven distribution of samples and easy to blur the classification boundary. Finally, we also endeavor to develop a new generation algorithm to reduce the probability of overlapping samples generated when synthesizing new samples and to ensure the diversity of new samples. Experimental results on 18 real-world data sets show that the average performance (G-mean, F1-measure and Area Under Curve) of SPAW-SMOTE is significantly better than other existing oversampling techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call