Global and Local Feature Based Deep Cross-Modal Hashing

Haibo Yu,Ran Ma,Ping An,Kai Li,Min Su

doi:10.1007/978-981-19-2266-4_20

Abstract

AbstractIn recent years, with the increasing number of different modal data, cross-modal retrieval has come to be an important research task. Deep cross-modal hashing receives increasing attention due to its combination of low storage cost, high search efficiency and strong capability of feature extraction of neural networks. Most of the existing deep cross-modal hashing methods extract the global semantic features of the images and texts to generate hash codes through two independent networks. In this work, we propose a novel method called Global and Local Feature based Deep Cross-Modal Hashing (GLFH). GLFH not only extracts the global features of the images and texts, but also extracts the features of regions containing objects in images and the features of words in texts by introducing the object detection method and the recurrent neural network, respectively. Then the local features of the images and texts are obtained by fusing the region features and the word features with the adaptively learned attention matrix. Finally, the global features and the local features are fused to obtain the richer features of image and text to generate the more robust hash codes. The experimental results on three widely used datasets are significantly surpass other methods, verifying the effectiveness of GLFH.KeywordsCross-modal hashingDeep learningGlobal feature and local featureAttention

Full Text