Combined feature compression encoding in image retrieval

Lu Huo,Leijie Zhang

doi:10.3906/elk-1803-3

Abstract

Recently, features extracted by convolutional neural networks (CNNs) are popularly used for image retrieval. In CNN representation, high-level features are usually chosen to represent the images in coarse-grained datasets, while mid-level features are successfully applied to describe the images for fine-grained datasets. In this paper, we combine these different levels of features as a joint feature to propose a robust representation that is suitable for both coarse-grained and fine-grained image retrieval datasets. In addition, in order to solve the problem that the efficiency of image retrieval is influenced by the dimensionality of indexing, a unified subspace learning model named spectral regression (SR) is applied in this paper. We combine SR and the robust representation of the CNN to form a combined feature compression encoding (CFCE) method. CFCE preserve the information without noticeably impacting image retrieval accuracy. We find the tendency of the image retrieval performance to change the compressed dimensionality of features. We further discover a reasonable dimensionality of indexing in image retrieval. Experiments demonstrate that our model provides state-of-the-art performances across datasets.

Highlights

After AlexNet [1] broke many records, convolutional neural networks (CNNs) have achieved great successes in a number of computer vision tasks, including object detection [2], human action recognition [3], visual recognition [4], and semantic segmentation [5]
This paper focuses on generating compact representation that is well suited for image retrieval
After many experiments, we find the regular pattern of image retrieval performance with the growth of compressed dimensionality of features

Summary

Introduction

After AlexNet [1] broke many records, convolutional neural networks (CNNs) have achieved great successes in a number of computer vision tasks, including object detection [2], human action recognition [3], visual recognition [4], and semantic segmentation [5]. Hand-crafted features are applied to traditional state-of-the-art image retrieval systems Local representation, such as scale-invariant feature transform (SIFT) [15] and local binary patterns(LBP) [16], can be aggregated into fixed-length vectors to describe a whole image used in image retrieval. A high-layer filter (FC features) may represent a mixture of patterns, which will greatly decrease the performance of the fine-grained image retrieval [26] These two types of features may be not applicable to different types of datasets. An alternative approach is using subspace learning algorithms This method acts as a dimensional reduction method to discover the discriminant structure in feature space and preserve the information of features without noticeably impacting the image retrieval accuracy

Spectral regression

Dimensionality reduction and encoding method

Comparison of layer performance

Feature compression using subspace learning

Findings

Conclusion