Abstract

Bag-of-word (BOW) is used in many state-of-the-art methods of image classification, and it is especially suitable for multi-class classification. Many kinds of local features and classifiers are applicable for the BOW model. However, it is unclear which kind of local feature is the most distinctive and meanwhile robust, and which classifier can optimize classification performance. In this paper, we discuss the implementation choices in the BOW model. Further, we evaluate the influences of local features and classifiers on object and texture recognition methods in the framework of the BOW model. To evaluate the implementation choices, we use two popular datasets: the Xerox7 dataset and the UIUCTex dataset. Extensive experiments are carried out to compare the performance of different detectors, descriptors and classifiers in term of classification accuracy on the object category dataset and the texture dataset. We find that the combinational detector which combines the MSER detector with the Hessian-Laplacian detector is efficient to find discriminative regions. We also find that the SIFT descriptor performs better than the other descriptors for image classification, and that the SVM classifier with the EMD kernel is superior to other classifiers. More than that, we propose an EMD spatial kernel to encode the spatial information of local features. The EMD spatial kernel is implemented on the Xerox7 dataset, the 4-class VOC2006 dataset and the 4-class Caltech101 dataset. The experimental results show that the proposed kernel outperforms the EMD kernel which does not consider the spatial information in image classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call