A presentation and retrieval hash scheme of images based on principal component analysis

Xin Ouyang,Min He,Chunyan Shuai,Jun Yang,Xu Wang

doi:10.1007/s00371-020-01973-8

Abstract

Image representation and approximate query is always a research challenge and is affected greatly by the dimension and size of images. Since hash-based methods and binary encodings in combination with other techniques, such as kernel tricks, a longer binary code and mapping vectors rotation, can maintain a linear query time and query accuracy, they have been used in this area broadly. This paper develops principal component analysis hashing (PCAH) and unequal length of binary coding to divide images into more categories, denoted as PCA-MD, to improve accuracy of the representation and lookup of images. This paper firstly proves that the eigenvector mapping is locality sensitive, which is the basis for more classes division. For the anisotropy of the eigenvectors, PCA-MD utilizes an unequal length of binary coding and fewer eigenvectors, rather than an equal code, to divide the images mapped on every eigenvector to more categories. Moreover, L1-norm distance is applied to measure the distances of images to avoid the enormous computation of Euclidean distance. Theoretical analysis and extensive experimental results demonstrate that the PCA-MD has a higher query performance and a slight longer run time than the state-of-the-art approaches based on the Hamming distance. This in turn verifies that PCAH is a locality sensitive hash and that partitioning into more categories rather than only two categories is reasonable.

Full Text