Abstract

With the rapid growth of social network, the cost of computation is increasing. Many existing algorithms are not suitable for the large-scale data. Apache Spark is an open-source cluster computing framework that empowers us to solve the problem of community detection in a cluster of computer. In this paper, we propose a novel label propagation algorithm on Spark, called PSPLPA (Probability and similarity based Parallel label propagation algorithm). PSPLPA employs a new label updating strategy using probability in the label propagation procedure during each iteration. First, weight calculation, which is based on k-shell, is integrated into the label initialization process. Second, parallel propagation steps are comprehensively proposed to utilize label probability efficiently. Third, randomness in label updating is significantly reduced via automatic label selection and similarity computation. Experiments conducted on artificial and real social networks demonstrate that the proposed algorithm exhibits high scalability and high accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call