Multiple Criteria Quadratic Programming (MCQP), a mathematical programming-based classification method, has been developed recently and proved to be effective and scalable. However, its performance degraded when learning from imbalanced data. This paper proposes a cost-sensitive MCQP (CS-MCQP) model by introducing the cost of misclassifications to the MCQP model. The empirical tests were designed to compare the proposed model with MCQP and a selection of classifiers on 26 imbalanced datasets from the UCI repositories. The results indicate that the CS-MCQP model not only performs better than the optimization-based models (MCQP and SVM), but also outperforms the selected classifiers, ensemble, preprocessing techniques and hybrid methods on imbalanced datasets in terms of AUC and GeoMean measures. To validate the results statistically, Student’s t test and Wilcoxon signed-rank test were conducted and show that the superiority of CS-MCQP is statistically significant with significance level 0.05. In addition, we analyze the effect of noisy, small disjunct and overlapping data properties on the proposed model and conclude that the CS-MCQP model achieves better performance on imbalanced data with overlapping feature than noisy and small disjunct data.
Read full abstract