Adaptive Affinity Propagation Clustering in MapReduce Environment

Wei-Chih Hung,Cheng-Yuan Tang,Yuan-Cheng Liu,Maw-Kae Hor,Yi-Leh Wu

doi:10.1007/978-3-319-13987-6_20

Abstract

The Affinity Propagation (AP) is a clustering algorithm based on the concept of “message passing” between data points. Unlike most clustering algorithms such as k-means, the AP does not require the number of clusters to be determined or estimated before running the algorithm. There are implementation of AP on Hadoop, a distribute cloud environment, called the Map/Reduce Affinity Propagation (MRAP). But the MRAP has a limitation: it is hard to know what value of parameter “preference” can yield an optimal clustering solution. The Adaptive Affinity Propagation Clustering (AAP) algorithm was proposed to overcome this limitation to decide the preference value in AP. In this study, we propose to combine these two methods as the Adaptive Map/Reduce Affinity Propagation (AMRAP), which divides the clustering task to multiple mappers and one reducer in Hadoop, and decides suitable preference values individually for each mapper. In the experiments, we compare the clustering results of the proposed AMRAP with the original MRAP method. The experiment results support that the proposed AMRAP method outperforms the original MRAP method in terms of accuracy.

Full Text