With the rapid development of deep neural networks, cross-modal hashing has made great progress. However, the information of different types of data is asymmetrical, that is to say, if the resolution of an image is high enough, it can reproduce almost 100% of the real-world scenes. However, text usually carries personal emotion and it is not objective enough, so we generally think that the information of image will be much richer than text. Although most of the existing methods unify the semantic feature extraction and hash function learning modules for end-to-end learning, they ignore this issue and do not use information-rich modalities to support information-poor modalities, leading to suboptimal results, although they unify the semantic feature extraction and hash function learning modules for end-to-end learning. Furthermore, previous methods learn hash functions in a relaxed way that causes nontrivial quantization losses. To address these issues, we propose a new method called graph convolutional network (GCN) discrete hashing. This method uses a GCN to bridge the information gap between different types of data. The GCN can represent each label as word embedding, with the embedding regarded as a set of interdependent object classifiers. From these classifiers, we can obtain predicted labels to enhance feature representations across modalities. In addition, we use an efficient discrete optimization strategy to learn the discrete binary codes without relaxation. Extensive experiments conducted on three commonly used datasets demonstrate that our proposed method graph convolutional network-based discrete hashing (GCDH) outperforms the current state-of-the-art cross-modal hashing methods.
Read full abstract