Community discovery by propagating local and global information based on the MapReduce model

Kun Guo,Wenzhong Guo,Yuzhong Chen,Qirong Qiu,Qishan Zhang

doi:10.1016/j.ins.2015.06.032

Abstract

Discovering communities in large-scale social networks efficiently and accurately is one of the challenges in social network data mining. We propose a clustering algorithm to discover social network communities based on the propagation of local and global information. Three strategies, namely, localizing propagation of affinity messages, relaxing self-exemplar constraints, and hierarchical processing, are employed in the algorithm to achieve reasonable time and space complexities in social networks. The local and global information is represented by the k-path edge centrality incorporated in the similarity calculation. The standalone algorithm is extended to provide parallel implementations based on the MapReduce model to accelerate processing in large-scale networks. Two well-known parallel computation frameworks, Hadoop and Spark, are adopted to implement the parallel algorithm. Experiments performed on artificial and real social network datasets show that the proposed algorithms can achieve near-linear time and space complexities with comparative clustering accuracy.

Full Text