Abstract

Given a set of points in a metric space, an additional query point and a positive threshold, a range query determines the subset of points whose distance from the query point does not exceed the given threshold. This paper tackles the problem of clustering the set of points so as to minimize the number of distance evaluations required by a range query. This problem models the efficient extraction of information from a database when the user is not interested in an exact match retrieval, but in the search for similar items. Since this need has become widespread in the management of text, image, audio and video databases, several data structures have been proposed to support such queries. Their optimization, however, is still left to extremely simple heuristic rules, if not to random choices. We propose the Balanced Compact Clustering Problem (BCCP) as a combinatorial model of this problem. We discuss its approximation properties and the complexity of special cases. Then, we present two Integer Programming formulations, prove their equivalence and introduce valid inequalities and variable fixing procedures. We discuss the application of a general-purpose solver on the more efficient formulation. Finally, we describe a Tabu Search algorithm and discuss its application to randomly generated and to real-world benchmark instances up to one hundred thousands points.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call