Abstract

Partitioning a set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means-type algorithm is best suited for implementing this operation because of its efficiency in clustering large numerical and categorical data sets. An efficient parallel k-means-type algorithm for clustering data sets on a distributed share-nothing parallel system is considered. It has a simple communication scheme which performs only one round of information exchange in every iteration. We show that the speedup of our algorithm is asymptotically linear when the number of objects is sufficiently large. We implement the parallel k-means-type algorithm on an IBM SP2 parallel machine. The performance studies show that the algorithm has nice parallelism in experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call