Abstract

In real-world binary classification problems, the entirety of samples belonging to each class varies. These types of problems where the majority class is notably bigger than the minority class can be called as class imbalance learning (CIL) problem. Due to the CIL problem, model performance may degrade. This paper presents a new support vector machine (SVM) model based on density weight for binary CIL (DSVM-CIL) problem. Additionally, an improved 2-norm-based density-weighted least squares SVM for binary CIL (IDLSSVM-CIL) is also proposed to increase the training speed of DSVM-CIL. In IDLSSVM-CIL, the least squares solution is obtained by considering 2-norm of slack variables and solving the primal problem of DSVM-CIL with equality constraints instead of inequality constraints. The basic ideas behind the algorithms are that the training datapoints are given weights during the training phase based on their class distributions. The weights are generated by using a density-weighted technique (Cha et al. in Expert Syst Appl 41(7):3343–3350, 2014) to reduce the effects of CIL. Experimental analyses are performed on some interesting imbalanced artificial and real-world datasets, and their performances are measured using the area under the curve and geometric mean (G-mean). The results are compared with SVM, least squares SVM, fuzzy SVM, improved fuzzy least squares SVM, affinity and class probability-based fuzzy SVM and entropy-based fuzzy least squares SVM. Similar or better generalization results indicate the efficacy and applicability of the proposed algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call