Abstract

We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constraints. The new constrained clustering problem generalizes a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are L ow GF(2)-R ank A pproximation , L ow B oolean -R ank A pproximation , and various versions of B inary C lustering . For example, for L ow GF(2)-R ank A pproximation problem, where for an m × n binary matrix A and integer r > 0, we seek for a binary matrix B of GF(2) rank at most r such that the ℓ 0 -norm of matrix A−B is minimum, our algorithm, for any ϵ > 0 in time f ( r ,ϵ)⋅ n ⋅ m , where f is some computable function, outputs a (1+ϵ)-approximate solution with probability at least (1−1\ e ). This is the first linear time approximation scheme for these problems. We also give (deterministic) PTASes for these problems running in time n f ( r )1\ϵ 2 log 1\ϵ , where f is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting on its own.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call