Access Structures for Angular Similarity Queries

Tan Apaydin Tan Apaydin,H Ferhatosmanoglu

doi:10.1109/tkde.2006.165

Abstract

Angular similarity measures have been utilized by several database applications to define semantic similarity between various data types such as text documents, time-series, images, and scientific data. Although similarity searches based on Euclidean distance have been extensively studied in the database community, processing of angular similarity searches has been relatively untouched. Problems due to a mismatch in the underlying geometry as well as the high dimensionality of the data make current techniques either inapplicable or their use results in poor performance. This brings up the need for effective indexing methods for angular similarity queries. We first discuss how to efficiently process such queries and propose effective access structures suited to angular similarity measures. In particular, we propose two classes of access structures, namely, Angular-sweep and Cone-shell, which perform different types of quantization based on the angular orientation of the data objects. We also develop query processing algorithms that utilize these structures as dense indices. The proposed techniques are shown to be scalable with respect to both dimensionality and the size of the data. Our experimental results on real data sets from various applications show two to three orders of magnitude of speedup over the current techniques.

Full Text