The analysis of a simple k -means clustering algorithm

Tapas Kanungo,Angela Y Wu,Ruth Silverman,Nathan S Netanyahu,Christine Piatko,David M Mount

doi:10.1145/336154.336189

Abstract

Abstract : K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in R(exp d) and an integer k, the problem is to determine a set of k points R(exp d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from applications in color quantization, compression, and segmentation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The analysis of a simple k -means clustering algorithm

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An efficient k-means clustering algorithm: analysis and implementation
T Kanungo ... R Silverman
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 24
T Kanungo, et. al.T Kanungo ... R Silverman
01 Jul 2002
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 24

A Parallel Clustering Algorithm Implementation Based on Apache Mahout
Xia Daoping ... Long Yubo
-
Xia Daoping, et. al.Xia Daoping ... Long Yubo
01 Jul 2016
01 Jul 2016

A fast implementation of the ISOCLUS algorithm
N Memarsadeghi ... D.M Mount
-
N Memarsadeghi, et. al.N Memarsadeghi ... D.M Mount
21 Jul 2003
21 Jul 2003

Editor's evaluation: Robust and Efficient Assessment of Potency (REAP) as a quantitative tool for dose-response curve estimation
Philip Boonstra
-
Philip BoonstraPhilip Boonstra
09 May 2022
09 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The analysis of a simple k -means clustering algorithm

Abstract

Talk to us

Similar Papers