Deep hashing has garnered considerable interest and has shown impressive performance in the domain of retrieval. However, the majority of the current hashing techniques rely solely on binary similarity evaluation criteria to assess the semantic relationships between multi-label instances, which presents a challenge in overcoming the feature gap across various modalities. In this paper, we propose semantic decomposition and enhancement hashing (SDEH) by extensively exploring the multi-label semantic information shared by different modalities for cross-modal retrieval. Specifically, we first introduce two independent attention-based feature learning subnetworks to capture the modality-specific features with both global and local details. Subsequently, we exploit the semantic features from multi-label vectors by decomposing the shared semantic information among multi-modal features such that the associations of different modalities can be established. Finally, we jointly learn the common hash code representations of multimodal information under the guidelines of quadruple losses, making the hash codes informative while simultaneously preserving multilevel semantic relationships and feature distribution consistency. Comprehensive experiments on four commonly used multimodal datasets offer strong support for the exceptional effectiveness of our proposed SDEH.
Read full abstract