Research and Improvement of Community Discovery Algorithm Based on Spark for Large Scale Complicated Networks

Hui Kang,Shengquan Chen,Lingfeng Lu,Chenkun Meng

doi:10.1109/trustcom50675.2020.00204

Abstract

Community discovery algorithm is one of the important topics in complex network research. However, there exist some problems in the traditional community discovery algorithm, such as the oscillation of label propagation, the convergence of iteration, the formation of a large community with a single label, namely “monster community”, or the poor effect of community discovery due to equal treatment of nodes. On the other hand, with the advent of the era of data, the computing power of single computer can't meet the demand of the rapid growth of complex network scale. Based on the above knowledge, this paper proposes the research and improvement of community discovery algorithm based on spark for large-scale complex networks. This paper first weights a complex network with no weight. Then, this paper chooses the classic efficient community discovery algorithm - label propagation algorithm to optimize label initialization, label propagation and label update strategy, iterative convergence strategy and so on, and establishes a new community discovery algorithm model. Then, the algorithm is connected to Spark, the algorithm is synchronized through GraphX programming, and a Spark experiment platform is established. Finally, some classic complex network data and some large-scale complex network data are tested and compared with some classic community discovery algorithms to verify the proposed algorithm is validated and verified by a large-scale complex network data set based on Spark GraphX platform does greatly improve the computational performance of community discovery in complex networks.

Full Text