Abstract
Recent years have brought rising interest in efficiently searching for similar entities in a broad range of domains. Such search can be used to facilitate working with unstructured data such as genome sequences, text corpora, complex production information, or multimedia content, where queries always contain an amount of noise. In such domains the only common structure is a distance function obeying the axioms of a metric. As mostly no other structure information is available, a lot of distances have to be computed during the course of a search. Contrary to classical database indexes, where the optimization focus is on reducing the number of disk accesses (or in case of in-memory databases the number of tree traversal operations), a major cost driver in such multimedia domains is this number of distance calculations which can be very computation intense. There exists a range of index structures for supporting similarity search in metric spaces. A very promising one is the M‑Tree, along with a number of compatible extensions (e. g. Slim-Tree, Bulk Loaded M‑Tree, multi way insertion M‑Tree, $$M^{2}$$ -Tree, etc.). The M‑Tree family uses common algorithms for the $$k$$ -nearest-neighbor and range search. These algorithms leave room for optimization in terms of necessary distance calculations. In this paper we present new algorithms for these tasks to considerably improve retrieval performance of all M‑Tree-compatible data structures.
Paper version not known (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have