Abstract

Similarity search in metric spaces refers to searching elements in data repositories that are similar to an element supplied by the user (query example). Similarity functions are used to determine which elements in the data repositories are similar to the query example and indexing mechanisms are used to improve the efficiency in the search. Classic indexation mechanisms such as LSH, M-Index, and M-Tree behave different according to the dimensionality in the metric space, volume of data repositories, and query strategies. In this paper, we describe SimSearch, a modular and flexible framework for similarity search in metric spaces, which allows to use, analyse, compare, and add several indexation mechanisms, search approaches, and query strategies. SimSearch allows doing queries given one or more example elements to obtain the set of elements more similar to the query examples, using query composition and Skyline. We show the variability of performance of several indexation mechanisms, including LSH-ML (our proposed variant of LSH), with experimental study in the domain of images represented by a feature vector in a high dimensionality metric space and Web Services represented by a vector with the values of Quality of Service (QoS) parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call