Multi-modal hashing has attracted enormous attention in large-scale multimedia retrieval, owing to its advantages of low storage cost and fast Hamming distance computation. Existing multi-modal hashing methods assume that all multi-modal data are well paired and then encode the paired multiple modalities into joint binary codes. However, it is not ensured that all data are fully paired in practical applications. In this paper, we present an adaptive semi-paired query hashing method, which facilitates learning the hash codes for semi-paired query samples. The proposed method performs projection learning and cross-modal reconstruction learning to maintain the semantic consistency between multi-modal data. Meanwhile, the semantic similarity structure and the complementary multi-modal information are preserved by hash codes to obtain a discriminative hash function. In the encoding stage, the missing modality features are completed via the learned cross-modal reconstruction matrices. In addition, the multimodal fusion weights are fine-tuned adaptively for the new query data to capture the modality difference. The extensive experiment results on three benchmark datasets show that our proposed algorithm outperforms state-of-the-art multi-modal hashing methods.
Read full abstract