AVBH: Asymmetric Learning to Hash with Variable Bit Encoding

Yanduo Ren,Yu Xin,Huahui Chen,Jiangbo Qian,Yihong Dong

doi:10.1155/2020/2424381

Abstract

Nearest neighbour search (NNS) is the core of large data retrieval. Learning to hash is an effective way to solve the problems by representing high-dimensional data into a compact binary code. However, existing learning to hash methods needs long bit encoding to ensure the accuracy of query, and long bit encoding brings large cost of storage, which severely restricts the long bit encoding in the application of big data. An asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed to solve the problem. The AVBH hash algorithm uses two types of hash mapping functions to encode the dataset and the query set into different length bits. For datasets, the hash code frequencies of datasets after random Fourier feature encoding are statistically analysed. The hash code with high frequency is compressed into a longer coding representation, and the hash code with low frequency is compressed into a shorter coding representation. The query point is quantized to a long bit hash code and compared with the same length cascade concatenated data point. Experiments on public datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of query.

Highlights

Given a query object/point q and a dataset S, the nearest neighbour search (NNS) [1,2,3] is to return the nearest neighbours in S to q
Based on the feature mapping method of random Fourier feature (RFF), data are mapped to the characteristic space under the approximate kernel function, and the inner product of any two points under the feature space is approximated by their kernel function values
We compared the performance of AVBH with that of several typical hashing methods: ACH [12], ITQ [19], KMH [20], PCAH [21], and LSH [13]

Summary

Introduction

Given a query object/point q and a dataset S, the nearest neighbour search (NNS) [1,2,3] is to return the nearest neighbours in S to q. Different from the direct hash code comparison, by cascade concatenating the coding of the data point to the same encoding length of the query point, the coding storage cost of the dataset is reduced effectively and the accuracy of the result is ensured This algorithm uses a unified compression method for all data, ignoring the effect of data distribution. E main contributions of this paper are as follows: (1) a variable bit encoding mechanism (named AVBH) based on hash code frequency compression is proposed, which makes the encoding space effectively used, and (2) the experiment shows that the AVBH can effectively reduce the storage cost and improve the query accuracy

Preliminaries and Description

Asymmetric Learning to Hash with Variable Bit Encoding

Encoding Functions

Experimental Datasets

Conclusion