Abstract

Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.

Highlights

  • With the advent of the era of big data, there are surging massive multimedia data on the Internet, such as images, videos, and texts. ese data usually exist in diversified modalities, for example, there may exist a textual data and an audio data describing a video data or an image data

  • Deep convolutional neural networks [11, 12] have been successfully utilized in many computer vision tasks, and some researchers deploy it in cross-modal hashing, such as deep cross-modal hashing (DCMH) [13], pairwise relationship guided deep hashing (PRDH) [14], self-supervised adversarial hashing (SSAH) [15], and triplet-based deep hashing (TDH) [16]

  • We can observe that SSAH outperforms than our proposed quadruplet-based deep cross-modal hashing method (QDCMH) in most cases, which is partly because SSAH takes self-supervised learning and generative adversarial networks into account during hash representation learning procedure. (2) e MAPs of QDCMH is always higher than the MAPs of TDH, which shows that quadruplet loss can better preserve semantic relevance than triplet loss in cross-modal hashing retrieval. (3) e MAPs of DSePH is always higher than the MAPs of semantics-preserving hashing (SePH), which demonstrates that deep neural networks have powerful features learning capacity

Read more

Summary

Introduction

With the advent of the era of big data, there are surging massive multimedia data on the Internet, such as images, videos, and texts. ese data usually exist in diversified modalities, for example, there may exist a textual data and an audio data describing a video data or an image data. Most deep cross-modal hashing methods utilize the pairwise loss (such as [13,14,15]) or the triplet loss (such as [16]) to preserve semantic relevance during the hash representation learning procedure. To this end, in this paper, we introduce quadruplet loss into cross-modal hashing and propose a quadruplet-based deep cross-modal hashing method (QDCMH). (i) We introduce quadruplet loss into cross-modal retrieval and propose a novel deep cross-modal hashing method.

Proposed Method
G Intermodal quadruplet loss
Learning Algorithm of QDCMH
Experiments
Handcrafted methods Deep methods
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call