Abstract
In the standard bag-of-visual-words (BoVW) model, the burstiness problem of features and the ignorance of high-order information often weakens the discriminative power of image representation. To tackle them, we present a novel framework, named the Salient Superpixel Network, to learn the mid-level image representation. For reducing the impact of burstiness occurred in the background region, we use the salient regions instead of the whole image to extract local features, and a fast saliency detection algorithm based on the Gestalt grouping principle is proposed to generate image saliency maps. In order to introduce the high-order information, we propose a weighted second-order pooling (WSOP) method, which is capable of exploiting the high-order information and further alleviating the impact of burstiness in the foreground region. Then, we conduct experiments on six image classification benchmark datasets, and the results demonstrate the effectiveness of the proposed framework with either the handcrafted or the off-the-shelf CNN features.
Highlights
Image classification aims to categorize a set of unlabeled images into several predefined classes according to their visual content
We further evaluate the performance of the Salient Superpixel Network (SSNet) framework with the off-the-shelf convolutional neural networks (CNN) local features
In the second set of experiments, we evaluate the performance of the SSNet framework using the off-the-shelf CNN local features
Summary
Image classification aims to categorize a set of unlabeled images into several predefined classes according to their visual content. Russakovsky et al [12] and Angelova et al [13] introduce location information to separate the foreground and background features and form the image representation. These methods have enhanced the discriminative ability of the representation; training an object detector is time-consuming. Theseintroducing methods have high-order information into the design of the feature descriptor contributes little to improve enhanced the discriminative ability of the representation; training an object detector is the performance of image classification tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.