Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Manisha Saini,Seba Susan

doi:10.1007/s11042-021-10612-w

Abstract

Classification of imbalanced multi-class image datasets is a challenging problem in computer vision. Most of the real-world datasets are imbalanced in nature because of the uneven distribution of the samples in each class. The problem with an imbalanced dataset is that the minority class having a smaller number of instance samples is left undetected. Most of the traditional machine learning algorithms can detect the majority class efficiently but lag behind in the efficient detection of the minority class, which ultimately degrades the overall performance of the classification model. In this paper, we have proposed a novel combination of visual codebook generation using deep features with the non-linear Chi2 SVM classifier to tackle the imbalance problem that arises while dealing with multi-class image datasets. The low-level deep features are first extracted by transfer learning using the ResNet-50 pre-trained network, and clustered using k-means. The center of each cluster is a visual word in the codebook. Each image is then translated into a set of features called the Bag-of-Visual-Words (BOVW) derived from the histogram of visual words in the vocabulary. The non-linear Chi2 SVM classifier is found most optimal for classifying the ensuing features, as proved by a detailed empirical analysis. Hence with the right combination of learning tools, we are able to tackle classification of multi-class imbalanced image datasets in an effective manner. This is proved from the higher scores of accuracy, F1-score and AUC metrics in our experiments on two challenging multi-class datasets: Graz-02 and TF-Flowers, as compared to the state-of-the-art methods.

Full Text