Abstract

AbstractIn recent years, with the increasing number of different modal data, cross-modal retrieval has come to be an important research task. Deep cross-modal hashing receives increasing attention due to its combination of low storage cost, high search efficiency and strong capability of feature extraction of neural networks. Most of the existing deep cross-modal hashing methods extract the global semantic features of the images and texts to generate hash codes through two independent networks. In this work, we propose a novel method called Global and Local Feature based Deep Cross-Modal Hashing (GLFH). GLFH not only extracts the global features of the images and texts, but also extracts the features of regions containing objects in images and the features of words in texts by introducing the object detection method and the recurrent neural network, respectively. Then the local features of the images and texts are obtained by fusing the region features and the word features with the adaptively learned attention matrix. Finally, the global features and the local features are fused to obtain the richer features of image and text to generate the more robust hash codes. The experimental results on three widely used datasets are significantly surpass other methods, verifying the effectiveness of GLFH.KeywordsCross-modal hashingDeep learningGlobal feature and local featureAttention

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.