3. Similarity searching: Indexing, nearest neighbor finding, dimensionality reduction, and embedding methods for applications in multimedia databases

Hanan Samet

doi:10.1109/icpr.2008.4760937

Abstract

Similarity searching is usually achieved by means of nearest neighbor finding. Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a low-dimensional vector space which is then indexed using representations such as k-d trees, R-trees, quadtrees, etc. The second directly indexes the the objects based on distances using representations such as the vp-tree, M-tree, etc. Mapping from a high-dimensional space into a low-dimensional space is known as dimensionality reduction and is achieved using SVD, DFT, etc. At times, when we just have distance information, the data objects are embedded in a vector space so that the distances of the embedded objects as measured by the distance metric in the embedding space approximate the actual distance. The search in the embedding space uses conventional indexing methods which are often coupled with dimensionality reduction. Some commonly known embedding methods are multidimensional scaling, Lipschitz embeddings, and Fast Map. This tutorial is organized into five parts that cover the five basic concepts outlined above: indexing low and high dimensional spaces, distance-based indexing, dimensionality reduction, embedding methods, and nearest neighbor searching.

Full Text