Abstract

Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. Four scalable and distributed similarity search structures will be presented. By exploiting parallelism in a dynamic network of computers, they all achieve practically constant search time for similarity range or nearest neighbor queries in data-sets of arbitrary sizes. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets. Results are used to establish specific pros and cons of individual approaches in different situations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call