Indexing Methods for Similarity Searching

Hanan Samet

doi:10.1109/enc.2007.9

Abstract

An overview is given of the various techniques and issues involved in providing indexing support for similarity searching. Similarity searching is a crucial part of retrieval in multimedia databases used for applications such as pattern recognition, image databases, and content-based retrieval. It involves finding objects in a data set S that are similar to a query object q based on some distance measure d which is usually a distance metric. The search process is usually achieved by means of nearest neighbor finding. Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a vector space. The vector space is usually of high dimension which requires special handling due to the fact indexing methods do not discriminate well in such spaces. In particular, the query regions often overlap all of the blocks that result from the decomposition of the underlying space. This has led to some special solutions that make use of a sequential scan. An alternative is to use dimension reduction to find a mapping from a high-dimensional space into a low-dimensional space by finding the most discriminating dimensions and then index the data using one of a number of different data structures such as k-d trees, R-trees, quadtrees, etc. The second directly indexes the objects based on distances making use of data structures such as the vp-tree, M-tree, etc.

Full Text