Abstract
Bag-of-Visual-Words (BoVW) is still a useful image classification model when there is not enough data to use Deep Learning. In BoVW model, the practice of reducing the reconstruction errors of local features can improve the classification accuracy owing to the decrease of information loss. Many reconstruction-based coding methods are proposed to learn a visual dictionary and encode local features via minimizing the reconstruction errors of local features with constraints. Besides this, the accuracy can also be improved by learning the category-specific dictionaries and then encoding features based on these dictionaries. By considering the two practices together, we propose a simple category-specific dictionary learning method tailored for reconstruction-based feature coding. Our method can be used as a universal one to improve the classification accuracies of many reconstruction-based coding methods, which is the highlight of our method. Concretely, a universal dictionary is learned by employing a reconstruction-based coding method and then refined for each category to obtain the category-specific dictionary of this category. When encoding a feature by a category-specific dictionary, the visual words for encoding it are decided in advance by the indices, which correspond to the non-zero elements of its coding vector obtained with the universal dictionary. The effectiveness of our method is validated by observing whether there is an accuracy improvement after applying our method. Our results on Scene-15, Caltech-101, and UIUC-Sports datasets show that the accuracies of four representative coding methods are improved by about 0.3% to 2.7%, which experimentally demonstrates the universality and effectiveness of our method.
Highlights
In decades, many methods for image classification have been presented in the field of computer vision
We aim to reduce the reconstruction errors of local features from positive samples and increase the errors of features from negative samples via encoding based on categoryspecific dictionaries
When encoding a local feature by a category-specific dictionary, the visual words for encoding it are decided in advance by the indices, which correspond to the non-zero elements of its coding vector obtained with the universal dictionary
Summary
Many methods for image classification have been presented in the field of computer vision. There are two representative image classification models, i.e., Bag-of-Visual-Words (BoVW) and convolutional neural network (CNN), which have achieved many encouraging results in the past ten years. A visual dictionary is learned using local features from training images by a learning algorithm such as K -means [3] or sparse coding [4]. Local features are encoded as coding vectors by the learned dictionary. CNN [6] is a deep neural network of exploiting image space structure. It consists of convolutional layers, pooling layers, non-linear activations, and fully connected layers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.