Abstract

• We propose the novel concept of metric hulls. • We define the covered operator using mutual distances only. • We evaluate the proposal on synthetic and real-life data. • Our method achieves online response times. • We show that a few hull objects provide high coverage in comparison to the related work. Similarity searching has become widely utilized in many online services processing unstructured and complex data, e.g., Google Images. Metric spaces are often applied to model and organize such data by their mutual similarity. As top-k queries provide only a local view on data, a data analyst must pose multiple requests to observe the entire dataset. Thus, group-by operators for metric data have been proposed. These operators identify groups by respecting a given similarity constraint and produce a set of objects per group. The analyst can then tediously browse these sets directly, but representative members may provide better insight. In this paper, we focus on concise representations of metric datasets. We propose a novel concept of a metric hull which encompasses a given set by selecting a few objects. Testing an object to be part of the set is then made much faster. We verify this concept on synthetic Euclidean data and real-life image and text datasets and show its effectiveness and scalability. The metric hulls provide much faster and more compact representations when compared with commonly used ball representations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call