Abstract

With the enlarging volumes of datasets in various areas and the rapid development of distributed technologies, parallel clustering is becoming increasingly important. To cluster large-scale data of various shapes, this paper proposes a parallel Chameleon clustering algorithm. The key idea is using a parallel minimum spanning tree algorithm to generate the initial clusters after obtaining the k-nearest neighbor graph of the original dataset in a parallel way inspired by matrix multiplication, and then using strategies suggested by the primary Chameleon clustering to combine clusters and obtain the final clusters. Finally, we design the parallel Chameleon clustering based on MapReduce. Experiments show that this algorithm is efficient and well-performed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call