While AUC maximizing support vector machine (AUCSVM) has been developed to solve imbalanced classification tasks, its huge computational burden will make AUCSVM become impracticable and even computationally forbidden for medium or large-scale imbalanced data. In addition, minority class sometimes means extremely important information for users or is corrupted by noises and/or outliers in practical application scenarios such as medical diagnosis, which actually inspires us to generalize the AUC concept to reflect such importance or upper bound of noises or outliers. In order to address these issues, by means of both the generalized AUC metric and the core vector machine (CVM) technique, a fast AUC maximizing learning machine, called ρ-AUCCVM, with simultaneous outlier detection is proposed in this study. ρ-AUCCVM has its notorious merits: 1) it indeed shares the CVM's advantage, that is, asymptotically linear time complexity with respect to the total number of sample pairs, together with space complexity independent on the total number of sample pairs and 2) it can automatically determine the importance of the minority class (assuming no noise) or the upper bound of noises or outliers. Extensive experimental results about benchmarking imbalanced datasets verify the above advantages of ρ-AUCCVM.
Read full abstract