Imbalanced data classification via support vector machines and genetic algorithms

Jair Cervantes,Xiaoou Li,Wen Yu

doi:10.1080/09540091.2014.924902

Abstract

Many real data sets are imbalanced and contain a large number of a certain type of patterns, but a very small number of another type of patterns. Normal classification methods, such as support vector machine (SVM), do not work well for these imbalanced data sets (IDS). It is difficult for SVMs to get the optimal separation hyperplane when they are trained with imbalanced data. In this paper, we propose a genetic algorithm (GA)-based classification method. A draft hyperplane and support vectors are first generated by SVMs. Then, GA is applied to compensate the imbalanced data. Finally, SVM is used again to find the best hyperplane from the generated data points. Compared with the other popular classification algorithms, our method has better classification accuracy for several IDS.

Full Text