Approximate Algorithms for Distance-Based Queries in High-Dimensional Data Spaces Using R-Trees

Antonio Corral,Joaquin Cañadas,Michael Vassilakopoulos

doi:10.1007/3-540-45710-0_14

Abstract

In modern database applications the similarity or dissimilarity of complex objects is examined by performing distance-based queries (DBQs)on data of high dimensionality. The R-tree and its variations are commonly cited multidimensional access methods that can be used for answering such queries. Although, the related algorithms work well for low-dimensional data spaces, their performance degrades as the number of dimensions increases (dimensionality curse). In order to obtain accept- able response time in high-dimensional data spaces, algorithms that ob- tain approximate solutions can be used. Three approximation techniques (α-allowance, N-consider and M-consider) and the respective recursive branch-and-bound algorithms for DBQs are presented and studied in this paper. We investigate the performance of these algorithms for the most representative DBQs (the K-nearest neighbors query and the K-closest pairs query) in high-dimensional data spaces, where the point data sets are indexed by tree-like structures belonging to the R-tree family:R*- trees and X-trees. The searching strategy is tuned according to several parameters, in order to examine the trade-off between cost (I/O activ- ity and response time) and accuracy of the result. The outcome of the experimental evaluation is the derivation of the outperforming DBQ ap- proximate algorithm for large high-dimensional point data sets.

Full Text