Abstract
Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the mutual dependence among local features results in highly variable sparse codes even for similar features. To overcome the shortcomings of previous sparse coding algorithm, we present an image classification method, which replaces the sparse dictionary with a stable dictionary learned via low computational complexity clustering, more specifically, a k-medoids cluster method optimized by k-means++. The proposed method can reduce the learning complexity and improve the feature’s stability. In the experiments, we compared the effectiveness of our method with the existing ScSPM method and its improved versions. We evaluated our approach on two diverse datasets: Caltech-101 and UIUC-Sports. The results show that our method can increase the accuracy of spatial pyramid matching, which suggests that our method is capable of improving performance of sparse coding features.
Highlights
Image classification is an important problem in computer vision
K-medoids cluster is applied to learn a dictionary from scale-invariant feature transform (SIFT) descriptors
The sparse coding spatial pyramid matching (ScSPM) method, which represents an image through SIFT extraction, sparse coding, and spatial pooling, has shown its advantage in this field
Summary
Image classification is an important problem in computer vision. Pattern recognition and machine learning techniques have been widely applied in this field, which usually extract image features and classify images according to the features. The setting of image features often significantly affects the performance of classification. The bag-of-features (BoF) representation methods [1,2,3] are important application of VQ and have been used in image classification and 1-D signal recognition [4]. This kind of representation is regarded as high-level feature because it represents an image as a vector of occurrence counts of a vocabulary built on the low-level feature extracted from subregions. The high-level feature concerns with the interpretation or classification of a scene, may achieve higher accuracy on most images
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.