Abstract

In most of the real world datasets, there is an imbalance in the number of samples belonging to different classes. Various pattern classification problems such as fault or disease detection involve class imbalanced data. The support vector machine (SVM) classifier becomes biased towards the majority class due to class imbalance. Moreover, in the existing SVM based techniques for class imbalance, there is no information about the distribution of data. Motivated by the idea of prior information about data distribution, a reduced universum twin support vector machine for class imbalance learning (RUTSVM-CIL) is proposed in this paper. For the first time, universum learning is incorporated with SVM to solve the problem of class imbalance. Oversampling and undersampling of data is performed to remove the imbalance in the classes. The universum data points are used to give prior information about the data. To reduce the computation time of our universum based algorithm, we use a small sized rectangular kernel matrix. The reduced kernel matrix needs less storage space, and thus applicable for large scale imbalanced datasets. Comprehensive experimentation is performed on various synthetic, real world and large scale imbalanced datasets. In comparison to the existing approaches for class imbalance, the proposed RUTSVM-CIL gives better generalization performance for most of the benchmark datasets. Also, the computation cost of RUTSVM-CIL is very less, making it suitable for real world applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call