Abstract

The Bregman k-median problem is defined as follows. Given a Bregman divergence Dφ and a finite set $P \subseteq {\mathbb R}^d$ of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C)=∑p∈P min c∈C Dφ(p,c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text classification, and speech processing. We study a generalization of the kmeans++ seeding of Arthur and Vassilvitskii (SODA '07). We prove for an almost arbitrary Bregman divergence that if the input set consists of k well separated clusters, then with probability $2^{-{\mathcal O}(k)}$ this seeding step alone finds an ${\mathcal O}(1)$-approximate solution. Thereby, we generalize an earlier result of Ostrovsky et al. (FOCS '06) from the case of the Euclidean k-means problem to the Bregman k-median problem. Additionally, this result leads to a constant factor approximation algorithm for the Bregman k-median problem using at most $2^{{\mathcal O}(k)}n$ arithmetic operations, including evaluations of Bregman divergence Dφ.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.