Abstract

Hashing is a widely adopted method based on an approximate nearest neighbor search and is used in large-scale image retrieval tasks. Conventional learning-based hashing algorithms employ end-to-end representation learning, which is a one-off technique. Because of the tradeoff between efficiency and performance, conventional learning-based hashing methods must sacrifice code length to improve performance, which increases their computational complexity. To improve the efficiency of binary codes, motivated by the “nonsalient-to-salient” attention scheme of humans, we propose a recursive hashing mechanism that maps progressively expanded salient regions to a series of binary codes. These salient regions are generated by a conventional saliency model based on bottom-up saliency-driven attention and a semantic-guided saliency model based on top-down task-driven attention. After obtaining a series of salient regions, we perform long-range temporal modeling of salient regions using a graph-based recurrent deep network to obtain more refined representative features. The later output nodes inherit aggregated information from all previous nodes and extract discriminative features from more salient regions. Therefore, this network possesses more significant information and satisfactory scalability. The proposed recursive hashing neural network, optimized by a triplet ranking loss, is end-to-end trainable. Extensive experimental results from several image retrieval benchmarks show the scalability of our method and demonstrate its strong performance compared with state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call