Abstract

Hashing based methods for cross-modal retrieval has been widely explored in recent years. However, most of them mainly focus on the preservation of neighborhood relationship and label consistency, while ignore the proximity of neighbors and proximity of classes, which degrades the discrimination of hash codes. And most of them learn hash codes and hashing functions simultaneously, which limits the flexibility of algorithms. To address these issues, in this article, we propose a two-step cross-modal retrieval method named Manifold-Embedded Semantic Hashing (MESH). It exploits Local Linear Embedding to model the neighborhood proximity and uses class semantic embeddings to consider the proximity of classes. By so doing, MESH can not only extract the manifold structure in different modalities, but also can embed the class semantic information into hash codes to further improve the discrimination of learned hash codes. Moreover, the two-step scheme makes MESH flexible to various hashing functions. Extensive experimental results on three datasets show that MESH is superior to 10 state-of-the-art cross-modal hashing methods. Moreover, MESH also demonstrates superiority on deep features compared with the deep cross-modal hashing method.

Highlights

  • D UE to its significant role in many real-world applications, cross-modal retrieval has been widely studied in recent years [1]–[4]

  • We obtain the following observations that are similar to Wiki and LabelMe. 1) Manifold-Embedded Semantic Hashing (MESH) is superior to other baselines with varied code lengths on both tasks

  • 4) MESH, Discrete cross-modal hashing (DCH), and Fast Discrete Cross-modal Hashing (FDCH) are in the first group, performing much better than other baselines, which further demonstrates the effectiveness of learning hash codes discretely in reducing quantization error

Read more

Summary

Introduction

D UE to its significant role in many real-world applications, cross-modal retrieval has been widely studied in recent years [1]–[4]. Among the existing methods for crossmodal retrieval, the hashing based methods (cross-modal hashing) are the most representative ones and have been investigated intensively. A large number of cross-modal hashing methods have been proposed recently, which consist of shallow learning [7]– [10] and deep learning methods [11]–[13]. All these methods have made significant efforts on improving the performance of cross-modal retrieval. The deep learning based methods usually obtain superior performance, they are sample-intensive and require large amounts of labelled training data. Many state-of-the-art methods are still shallow learning models, which are the main concerns of this paper

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call