Abstract

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S. First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomial-time solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any Ɛ > 0 it is NP-hard to split S into at most pk1/7-Ɛ clusters whose Hamming diameter doesn't exceed the p-diameter. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pϱ/Ɛ)kO(p/Ɛ)n2-time (1+ Ɛ)- approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ϱ = O(log(k+n)): Finally, we show how to find in O((n/Ɛ + kn log n + k2 log n)(2ϱk)2/Ɛ) time a set L of O(p log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + Ɛ)ϱ, for any constant 0 < Ɛ < 1.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.