PARCLE: a parallel clustering algorithm for cluster system

Bing Zhou Bing Zhou,Jun-Yi Shen Jun-Yi Shen,Qin-Ke Peng Qin-Ke Peng

doi:10.1109/icmlc.2003.1264431

Abstract

As a low-cost, all-purpose parallel computing system with the advantages of easy usage and good dependability, the cluster system has become a popular platform in lots of fields. Clustering analyzing is one of the important problems in data mining. Because most of its objects are large-scale databases or high-dimension data, clustering requests more powerful computing availability. So how to develop parallel clustering algorithm based on cluster system deserves attention. This paper proposes a new parallel clustering algorithm called PARCLE for very large databases that are suitable for cluster system. This algorithm adopts data parallelism and asynchronous communication to reduce the communication costs. It applies a new clustering algorithm derived from BIRCH to improve the quality of clustering. Our implementation shows high speedups with negligible communication overheads and good clustering result not less that that of linear clustering algorithm.

Full Text