Abstract

Due to the rapid development of deep learning, cross-modal retrieval has achieved significant progress in recent years. Moreover, cross-modal hashing has recently attracted considerable attention to multi-modal retrieval applications due to its advantages of low storage costs and fast retrieval speed. However, it is still a challenging problem due to an existing semantic heterogeneity gap between different modalities. In order to further narrow the gap and obtain more effective hash codes, we put forward a novel mask deep cross-modal hashing (MDCH) approach to explore the similarity between inter-modal instances. The main contributions of this paper are that: (1) we attempt to introduce semantic mask information into cross-modal hashing retrieval, (2) we alternately train intra-modal and inter-modal networks to fully mine the semantic relationship between different modalities. The semantic mask can improve the semantic information of the image feature. While inter-modal similarity, explored by inter-modal networks, focuses on enforcing images and their corresponding text tags to have similar hash codes, intra-modal similarity, explored by intra-modal networks, can retain local structural information embedded in each modality to achieve internal similarity. A large number of experiments conducted on three datasets demonstrate that our proposed MDCH approach is superior to several state-of-the-art cross-modal hashing approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call