Abstract

Recently, due to the low storage consumption and high search efficiency of hashing methods and the powerful feature extraction capability of deep neural networks, deep cross-modal hashing has received extensive attention in the field of multi-media retrieval. However, existing methods tend to ignore the latent relationships between heterogeneous data when learning a common semantic subspace, and cannot retain more important semantic information when mining deep correlations. In this paper, an attention mechanism which focuses on the characteristics of the associated features is employed to propose an attention-aware semantic fusion matrix that integrates important information from different modalities. We introduce a novel network that can pass the extracted features through the attention module to efficiently encode rich and relevant features, and can also generate hash codes under the self-supervision of the proposed attention-aware semantic fusion matrix. Our experimental results and detailed analysis prove that our method can achieve better retrieval performance on the three popular datasets, compared with the recent unsupervised cross-modal hashing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.