Abstract

This study proposes a novel algorithm that enhances the distinctiveness of the traditional vector of locally aggregated descriptors (VLAD) using spatial distribution clue of local features. The algorithm introduces a new method to compute the spatial distribution entropy (SDE) of clusters. Unlike conventional methods, this algorithm considers the distribution of full spatial information provided by local feature detectors rather than only utilizing the spatial coordinate statistics. For each cluster, the corresponding spatial distribution is computed using a histogram of spatial locations, scales, and orientations of all local features inside the cluster. Entropy is calculated from the spatial distributions of all clusters of an image to create a distribution function, which is further normalized and concatenated with the VLAD vector to generate the final representation. Image retrieval and classification experiments on public datasets are performed. Experimental results show that the proposed algorithms produce better or comparable retrieval performance than several state-of-the-art algorithms. In addition, we extend our SDE to the convolutional neural network (CNN) feature, which further improves the CNN feature result in image retrieval.

Highlights

  • With the rapid development of camera and internet technology, numerous large image and video databases continue to increase

  • We found that the combination of power [8] and L2 normalization [8] is an excellent choice to perform the normalization for vector of locally aggregated descriptors (VLAD) representation

  • Our SDEVLAD significantly improved the result by adding spatial information to VLAD

Read more

Summary

Introduction

With the rapid development of camera and internet technology, numerous large image and video databases continue to increase. Many visual search systems are available for retrieving relevant multimedia content to the query, and the most simple and direct method is relying on textual label with the multimedia content These systems significantly suffer from the semantic gap [1]. The BoW descriptor represents the distribution of visual words and shows considerable distinctiveness and robustness; this descriptor has been widely used in the field of CBIR. Based on these pioneering works, aggregated vector-based methods emerged, including vector quantization [3], sparse coding [4], localityconstrained linear coding [5], and soft assignment [6]. The vector of locally aggregated descriptors (VLAD) [8] is one of the most widely adopted aggregated vector-based methods

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.