Abstract

The low storage and strong representation capabilities of hash codes for image retrievalhas made hashing technologies very popular. Several existing deep hashing methods focuson the task of general image retrieval, while neglecting the task of fine-grained image retrieval. Recently, some fine-grained hashing methods have been proposed to capture the subtle differences, which mainly utilize the single-modality visual features to solve the discriminative region localization while ignoring the semantic information. In this letter, we propose a correlation filtering hashing (CFH) method to learn discrete binary codes, which can adequately take advantage of the cross-modal correlation between the semantic information and the visual features for discriminative region localization. Specifically, we utilize a feature pyramid network to learn multi-level visual features. Subsequently, the label vector is embedded into the visual space, which can be used as a correlation filter on the feature maps to capture the latent location of objects. Finally, weperform global average pooling over the output maps and concatenate the features of different levels to produce the hash codes of query images. Extensive experiments on two fine-grained datasets show that the proposed CFH outperforms the state-of-the-art hashing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call