Improving the permutation-based proximity searching algorithm using zones and partial information

Karina Figueroa,Rodrigo Paredes,Antonio Camarena-Ibarrola,Héctor Tejeda-Villela

doi:10.1016/j.patrec.2017.04.012

Karina Figueroa, Rodrigo Paredes + Show 2 more

https://doi.org/10.1016/j.patrec.2017.04.012

Copy DOI

Abstract

Similarity searching is a very useful task in several disciplines such as pattern recognition, machine learning, and decision theory. To solve this task we can use an index to speed up the searching. Among the current indices, the permutant based searching approach has proved its efficiency for high-dimensional data before, however up to now this approach had not been adapted to work with low-dimensional data where the approach seemed useless. We propose several ways to adapt the permutant searching approach for low-dimensional data, using zones varying the distribution of the radii, trying different distance measures, and using partial distance computation as well. After many experiments, we arrived to conclusions about the optimal values of the parameters using a synthetic database of vectors, and then use these learned values on real databases obtaining excellent results for k-nearest neighbor queries, both in high and low dimensional data.

Full Text