Abstract

To handle the large-scale data in terms of storage and searching time, learning to hash becomes popular due to its efficiency and effectiveness in approximate cross-modal nearest neighbors searching. Most existing unsupervised cross-modal hashing methods, to shorten the semantic gap, try to simultaneously minimize the loss of intra-modal similarity and the loss of inter-modal similarity. However, these models can not guarantee in theory these two losses are simultaneously minimized. In this paper, we first theoretically proved that cross-modal hashing could be implemented by protecting both intra-modal and inter-modal similarity with the aid of variational inference technique and point out the problem that maximizing intra and inter-modal similarity is mutually constrained. In this case, we propose an unsupervised cross-modal hashing framework named as Unsupervised Deep Fusion Cross-modal Hashing (UDFCH) which leverages the data fusion to capture the underlying manifold across modalities to avoid above problem. What’s more, in order to reduce the quantization loss, we sample hash codes from different Bernoulli distributions through a reparameterization trick. Our UDFCH framework has two stages. The first stage aims at mining the the intra-modal structure of each modality. The second stage aims to determine the modality-aware hash code by sufficiently considering the correlation and manifold structure among modalities. A series of experiments conducted on three benchmark datasets show that the proposed UDFCH framework outperforms the state-of-the-art methods on different cross-modal retrieval tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call