Abstract

Although unsupervised deep hashing is potentially very useful for tackling many large-scale tasks, its performance is still far below satisfactory. Additionally, its performance might be significantly improved by effectively exploiting the pair similarity relationship among training data, but the attained similarity matrix usually contains noisy information, which often largely decreases the model performance. To alleviate this issue, in this paper, we propose a novel unsupervised deep pairwise hashing method to effectively and robustly exploit the similarity information between training samples and multiple anchors. We first create an ensemble anchor-based pairwise similarity matrix to enhance the robustness of similarity and dissimilarity relations between training samples and anchors. Afterwards, we propose a novel loss function to directly and robustly take advantage of the similarity and dissimilarity information via a weighted cross-entropy loss, and make use of a square loss to reduce the gap between latent binary vectors and binary codes, and another square loss to form consensus predictions of latent binary vectors. Extensive experiments on large-scale benchmark databases demonstrate the effectiveness of the proposed method, which outperforms recent state-of-the-art unsupervised hashing methods with significantly better ranking performance.

Highlights

  • Hashing has attracted considerable attention for tackling large-scale tasks because it can encode originally high-dimensional data into short binary codes while maintaining the similarity of neighbors, thereby leading to significant gains in computation and storage costs [1,2]

  • We propose a novel loss function composed of three terms: a weighted cross-entropy loss to exploit the similarity information between training data and multiple anchors, a mean square loss to reduce the gap between latent binary vectors and desired codes, and another mean square loss to form consensus predictions of latent binary vectors;

  • Note that similarity-adaptive deep hashing (SADH) usually achieves its best mean average precision (MAP) with short binary codes, e.g., 16-bit, while unsupervised deep pairwise hashing (UDPH) obtains better performance with an increasing number of bits. This might be because SADH can effectively preserve the similarity information of the low-rank graph matrix by using short binary codes, but UDPH with longer binary codes can better preserve the similarity relationship between training data and anchors

Read more

Summary

Introduction

Hashing has attracted considerable attention for tackling large-scale tasks because it can encode originally high-dimensional data into short binary codes while maintaining the similarity of neighbors, thereby leading to significant gains in computation and storage costs [1,2]. Supervised hashing [3,4] usually requires a large amount of labels to achieve satisfactory performance; label annotation is usually time-consuming and expensive. By contrast, unsupervised hashing [5] does not need semantic labels and aims to discover and, encode the significant intrinsic patterns or structures hidden in data into binary codes. Numerous data-dependent hashing methods have been proposed and achieved promising performance on various similarity measures, such as Euclidean distance and 1-norm distance [7], they are still far from being satisfactory for many tasks via the semantic similarity measure. Most of them [5,8,9] learn hash functions using handcrafted features, which might not be able to represent the image content [10] optimally

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.