Abstract

As a low-cost, all-purpose parallel computing system with the advantages of easy usage and good dependability, the cluster system has become a popular platform in lots of fields. Clustering analyzing is one of the important problems in data mining. Because most of its objects are large-scale databases or high-dimension data, clustering requests more powerful computing availability. So how to develop parallel clustering algorithm based on cluster system deserves attention. This paper proposes a new parallel clustering algorithm called PARCLE for very large databases that are suitable for cluster system. This algorithm adopts data parallelism and asynchronous communication to reduce the communication costs. It applies a new clustering algorithm derived from BIRCH to improve the quality of clustering. Our implementation shows high speedups with negligible communication overheads and good clustering result not less that that of linear clustering algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call