Abstract

Cross-modal hashing has received intensive attention due to its low computation and storage efficiency in cross-modal retrieval task. Most previous cross-modal hashing methods mainly focus on extracting correlated binary codes from the pairwise label, but largely ignore the semantic categories of cross-modal data. On the other hand, human perception exploits category information to connect cross-modal samples. Inspired by this fact, we propose to embed category information into hash codes. More specifically, we introduce semantic prediction loss into our framework to enhance hash codes with category supervision. In addition, there always exists a large gap between features from different modalities (e.g. text and images), leading cross-modal hashing to link irrelevant features for retrieval task. To address this issue, this paper proposes Dual-Supervised Attention Network for Deep Hashing (DSADH) to learn the cross-modal relationship via an elaborately-designed attention mechanism. Our cross-modal network applies cross-modal attention block to efficiently encode rich and relevant features to learn compact hash codes. Extensive experiments on three challenging benchmarks demonstrate that our proposed method significantly improves the retrieval results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call