Abstract

The current deep convolution features based on retrieval methods cannot fully use the characteristics of the salient image regions. Also, they cannot effectively suppress the background noises, so it is a challenging task to retrieve objects in cluttered scenarios. To solve the problem, we propose a new image retrieval method that employs a novel feature aggregation approach with an attention mechanism and utilizes a combination of local and global features. The method first extracts global and local features of the input image and then selects keypoints from local features by using the attention mechanism. After that, the feature aggregation mechanism aggregates the keypoints to a compact vector representation according to the scores evaluated by the attention mechanism. The core of the aggregation mechanism is to allow features with high scores to participate in residual operations of all cluster centers. Finally, we get the improved image representation by fusing aggregated feature descriptor and global feature of the input image. To effectively evaluate the proposed method, we have carried out a series of experiments on large-scale image datasets and compared them with other state-of-the-art methods. Experiments show that this method greatly improves the precision of image retrieval and computational efficiency.

Highlights

  • Content-based image retrieval (CBIR) has been a spotlight in the field of computer vision

  • CMES, 2022, vol.131, no.1 development process from extracting shallow layer features based on Scale-invariant feature transform (SIFT) [1], speeded up robust features (SURF) [2] algorithms and embedding coding method in combination with bag of words (BOW) [3,4], fisher vector (FV) [5] and vector of local aggregated descriptors (VLAD) [6] to extracting deep layer features based on the deep convolutional neural network

  • We propose a new large-scale image retrieval method, which is based on aggregated local feature descriptors and global feature descriptor, utilizing the advantages of the combination of the two descriptors

Read more

Summary

Introduction

Content-based image retrieval (CBIR) has been a spotlight in the field of computer vision. It represents the image as a vector by feature extraction algorithms. It uses the nearest neighbor search methods to find the image like the given query image, among which feature extraction algorithms play a key role in improving image retrieval performance. In order to extract image features with more discriminability and form effective image representation, a lot of research has been done on feature extraction algorithms. CMES, 2022, vol.131, no.1 development process from extracting shallow layer features based on Scale-invariant feature transform (SIFT) [1], speeded up robust features (SURF) [2] algorithms and embedding coding method in combination with bag of words (BOW) [3,4], fisher vector (FV) [5] and vector of local aggregated descriptors (VLAD) [6] to extracting deep layer features based on the deep convolutional neural network

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call