Unsupervised cross-media Hashing retrieval based on multi-head attention network

志欣 李,慧芳 马,智平 施,锋 凌,振军 唐

doi:10.1360/ssi-2020-0264

Abstract

The cross-media Hash retrieval encodes different media data into a common binary Hash space, which can effectively measure the correlation between different modal samples. In order to further improve the retrieval performance, this paper proposes an unsupervised cross-media Hash retrieval method based on multi-head attention network. First, we use a multi-head attention network to generate a Hash code matrix, which makes the images and texts match better. Second, an auxiliary similarity matrix is constructed to integrate the original neighborhood information from different modalities. Through the collaborative learning of auxiliary similarity matrix and Hash code matrix, our method can capture the potential correlations between different modalities and within the same modality. In addition, we design two loss functions to train the model, and adopt strategies of batch normalization and replacing Hash code generation functions to optimize the model, which greatly improves the training speed of the model. Experiments on three datasets show that the average performance of our method is significantly higher than many state-of-the-art unsupervised methods, which fully proves the effectiveness and superiority of our method.

Full Text