Abstract

In the standard bag-of-visual-words (BoVW) model, the burstiness problem of features and the ignorance of high-order information often weakens the discriminative power of image representation. To tackle them, we present a novel framework, named the Salient Superpixel Network, to learn the mid-level image representation. For reducing the impact of burstiness occurred in the background region, we use the salient regions instead of the whole image to extract local features, and a fast saliency detection algorithm based on the Gestalt grouping principle is proposed to generate image saliency maps. In order to introduce the high-order information, we propose a weighted second-order pooling (WSOP) method, which is capable of exploiting the high-order information and further alleviating the impact of burstiness in the foreground region. Then, we conduct experiments on six image classification benchmark datasets, and the results demonstrate the effectiveness of the proposed framework with either the handcrafted or the off-the-shelf CNN features.

Highlights

  • Image classification aims to categorize a set of unlabeled images into several predefined classes according to their visual content

  • We further evaluate the performance of the Salient Superpixel Network (SSNet) framework with the off-the-shelf convolutional neural networks (CNN) local features

  • In the second set of experiments, we evaluate the performance of the SSNet framework using the off-the-shelf CNN local features

Read more

Summary

Introduction

Image classification aims to categorize a set of unlabeled images into several predefined classes according to their visual content. Russakovsky et al [12] and Angelova et al [13] introduce location information to separate the foreground and background features and form the image representation. These methods have enhanced the discriminative ability of the representation; training an object detector is time-consuming. Theseintroducing methods have high-order information into the design of the feature descriptor contributes little to improve enhanced the discriminative ability of the representation; training an object detector is the performance of image classification tasks.

Observing
Related
Research on Mid-Level
Related Work
Methods of Extracting the Off-the-Shelf CNN Feature
Research Work about Burstiness Issue
The Proposed Method for Image Representation
Saliency Region Detection
Measuring the Gestalt Grouping Connectedness
Saliency Map Generation
The Proposed Feature Weighting Method
Weighted Second-Order Pooling
Vectorization and Normalization
The Mid-Level Image Representation Based on SSNet
Figure
Experiments and Results
Experimental Setting
Benchmark Datasets
Effectiveness Evaluation of Weighted Second-Order Pooling
Performance Analysis of Mid-Level Representation Based on SSNet
Comparison with Related BoVW Baselines
SOA Methods
Methods
Some sample imagesfrom fromthe theFood-101
Limitations
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.