Abstract

Nearest neighbour search (NNS) is the core of large data retrieval. Learning to hash is an effective way to solve the problems by representing high-dimensional data into a compact binary code. However, existing learning to hash methods needs long bit encoding to ensure the accuracy of query, and long bit encoding brings large cost of storage, which severely restricts the long bit encoding in the application of big data. An asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed to solve the problem. The AVBH hash algorithm uses two types of hash mapping functions to encode the dataset and the query set into different length bits. For datasets, the hash code frequencies of datasets after random Fourier feature encoding are statistically analysed. The hash code with high frequency is compressed into a longer coding representation, and the hash code with low frequency is compressed into a shorter coding representation. The query point is quantized to a long bit hash code and compared with the same length cascade concatenated data point. Experiments on public datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of query.

Highlights

  • Given a query object/point q and a dataset S, the nearest neighbour search (NNS) [1,2,3] is to return the nearest neighbours in S to q

  • Based on the feature mapping method of random Fourier feature (RFF), data are mapped to the characteristic space under the approximate kernel function, and the inner product of any two points under the feature space is approximated by their kernel function values

  • We compared the performance of AVBH with that of several typical hashing methods: ACH [12], ITQ [19], KMH [20], PCAH [21], and LSH [13]

Read more

Summary

Introduction

Given a query object/point q and a dataset S, the nearest neighbour search (NNS) [1,2,3] is to return the nearest neighbours in S to q. Different from the direct hash code comparison, by cascade concatenating the coding of the data point to the same encoding length of the query point, the coding storage cost of the dataset is reduced effectively and the accuracy of the result is ensured This algorithm uses a unified compression method for all data, ignoring the effect of data distribution. E main contributions of this paper are as follows: (1) a variable bit encoding mechanism (named AVBH) based on hash code frequency compression is proposed, which makes the encoding space effectively used, and (2) the experiment shows that the AVBH can effectively reduce the storage cost and improve the query accuracy

Preliminaries and Description
Asymmetric Learning to Hash with Variable Bit Encoding
Encoding Functions
Experimental Datasets
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call