A multi-class boosting method for learning from imbalanced data

Xiaohui Yuan,Mohamed Abouelenien

doi:10.1504/ijgcrsis.2015.074722

Abstract

The acquisition of face images is usually limited due to policy and economy considerations, and hence the number of training examples of each subject varies greatly. The problem of face recognition with imbalanced training data has drawn attention of researchers and it is desirable to understand in what circumstances imbalanced dataset affects the learning outcomes, and robust methods are needed to maximise the information embedded in the training dataset without relying much on user introduced bias. In this article, we study the effects of uneven number of training images for automatic face recognition and proposed a multi-class boosting method that suppresses the face recognition errors by training an ensemble with subsets of examples. By recovering the balance among classes in the subsets, our proposed multiBoost.imb method circumvents the class skewness and demonstrates improved performance. Experiments are conducted with four popular face datasets and two synthetic datasets. The results of our method exhibits superior performance in high imbalanced scenarios compared to AdaBoost.M1, SAMME, RUSboost, SMOTEboost, SAMME with SMOTE sampling and SAMME with random undersampling. Another advantage that comes with ensemble training using subsets of examples is the significant gain in efficiency.

Full Text