Worst-case and smoothed analysis of k-means clustering with Bregman divergences

Bodo Manthey ,Heiko Röglin

doi:10.20382/jocg.v4i1a5

Abstract

The $k$-means method is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice despite its exponential worst-case running-time. To narrow the gap between theory and practice, $k$-means has been studied in the semi-random input model of smoothed analysis, which often leads to more realistic conclusions than mere worst-case analysis. For the case that $n$ data points in $\mathbb{R}^d$ are perturbed by Gaussian noise with standard deviation $\sigma$, it has been shown that the expected running-time is bounded by a polynomial in $n$ and $1/\sigma$. This result assumes that squared Euclidean distances are used as the distance measure. In many applications, however, data is to be clustered with respect to Bregman divergences rather than squared Euclidean distances. A prominent example is the Kullback-Leibler divergence (a.k.a. relative entropy) that is commonly used to cluster web pages. To broaden the knowledge about this important class of distance measures, we analyze the running-time of the $k$-means method for Bregman divergences. We first give a smoothed analysis of $k$-means with (almost) arbitrary Bregman divergences, and we show bounds of poly($n^{\sqrt{k}}$), $1/\sigma$) and $k^{kd}$.poly($n$,$1/\sigma$). The latter yields a polynomial bound if $k$ and $d$ are small compared to $n$. On the other hand, we show that the exponential lower bound carries over to a huge class of Bregman divergences.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Worst-case and smoothed analysis of k-means clustering with Bregman divergences

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Geometry

Lead the way for us

Journal: Journal of Computational Geometry	Publication Date: Jul 22, 2013
Citations: 4

Similar Papers

Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences
Bodo Manthey ... Heiko Röglin
-
Bodo Manthey, et. al.Bodo Manthey ... Heiko Röglin
01 Jan 2009
01 Jan 2009

Clustering with Bregman Divergences
Arindam Banerjee ... Srujana Merugu
-
Arindam Banerjee, et. al.Arindam Banerjee ... Srujana Merugu
22 Apr 2004
22 Apr 2004

A characterization of statistical manifolds on which the relative entropy is a Bregman divergence
Hiroshi Nagaoka
-
Hiroshi NagaokaHiroshi Nagaoka
01 Jul 2016
01 Jul 2016

Distance Analysis Measuring for Clustering using K-Means and Davies Bouldin Index Algorithm
Ali Idrus ... Ahmad Tohir
TEM Journal | VOL. -
Ali Idrus, et. al.Ali Idrus ... Ahmad Tohir
25 Nov 2022
TEM Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Worst-case and smoothed analysis of k-means clustering with Bregman divergences

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Geometry