Abstract

The cross-modal hashing method can map heterogeneous multimodal data into a compact binary code that preserves semantic similarity, which can significantly enhance the convenience of cross-modal retrieval. However, the currently available supervised cross-modal hashing methods generally only factorize the label matrix and do not fully exploit the supervised information. Furthermore, these methods often only use one-directional mapping, which results in an unstable hash learning process. To address these problems, we propose a new supervised cross-modal hash learning method called Discrete Two-step Cross-modal Hashing (DTCH) through the exploitation of pairwise relations. Specifically, this method fully exploits the pairwise similarity relations contained in the supervision information: for the label matrix, the hash learning process is stabilized by combining matrix factorization and label regression; for the pairwise similarity matrix, a semirelaxed and semidiscrete strategy is adopted to potentially reduce the cumulative quantization errors while improving the retrieval efficiency and accuracy. The approach further combines an exploration of fine-grained features in the objective function with a novel out-of-sample extension strategy to enable the implicit preservation of consistency between the different modal distributions of samples and the pairwise similarity relations. The superiority of our method was verified through extensive experiments using two widely used datasets.

Highlights

  • With the development of Internet technology in recent years, a large quantity of multimodal data obtained from video, audio, image, text, and other sources are being disseminated rapidly across social networks

  • As the similarities between hash codes are calculated as Hamming distances, the XOR operation can be implemented on hardware to significantly improve retrieval efficiency

  • The one-directional regression used by supervised methods is not conducive to the full exploitation of supervision information and will cause the hash learning process to be unstable. erefore, in this paper we propose a new supervised cross-modal hashing method that combines label matrix factorization and hash code regression to achieve bidirectional mapping

Read more

Summary

Introduction

With the development of Internet technology in recent years, a large quantity of multimodal data obtained from video, audio, image, text, and other sources are being disseminated rapidly across social networks. A common requirement in real scenarios is cross-modal retrieval, e.g., retrieving corresponding images or videos through text descriptions. Owing to its high retrieval efficiency and low space cost, cross-modal hashing has become one of the primary methods in the field of cross-modal retrieval [1]. Crossmodal hash learning attempts to convert multimodal data into a set of short binary codes (called hash codes) in Hamming space while preserving the original sample relations and to learn a set of mapping functions from the specific modality to the sample hash code. E binary code of the common space and the mapping from the specific modality to the common space can be used to achieve crossmodal retrieval. The storage cost is reduced because the hash code length is relatively short

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.