Discrete Fusion Adversarial Hashing for cross-modal retrieval

Jing Li,En Yu,Jianhua Ma,Xiaojun Chang,Huaxiang Zhang,Jiande Sun

doi:10.1016/j.knosys.2022.109503

Abstract

Deep cross-modal hashing enables a flexible and efficient way for large-scale cross-modal retrieval. Existing cross-modal retrieval methods based on deep hashing aim to learn the unified hashing representation for different modalities with the supervision of pair-wise correlation, and then encode the out-of-samples via modality-specific hashing network. However, the semantic gap and distribution shift were not considered enough, and the hashing codes cannot be unified as expected under different modalities. At the same time, hashing is still a discrete problem that has not been solved well in the deep neural network. Therefore, we propose the Discrete Fusion Adversarial Hashing (DFAH) network for cross-modal retrieval to address these issues. In DFAH, the Modality-Specific Feature Extractor is designed to capture image and text features with pair-wise supervision. Especially, the Fusion Learner is proposed to learn the unified hash code, which enhances the correlation of heterogeneous modalities via the embedding strategy. Meanwhile, the Modality Discriminator is designed to adapt to the distribution shift cooperating with the Modality-Specific Feature Extractor in an adversarial way. In addition, we design an efficient discrete optimization strategy to avoid the relaxing quantization errors in the deep neural framework. Finally, the experiment results and analysis on several popular datasets also show that DFAH outperforms the state-of-the-art methods for cross-modal retrieval.

Full Text