PASLPA - Overlapping Community Detection in Massive Real Networks Using Apache Spark

Navid Sedighpour,Alireza Bagheri

doi:10.1109/istel.2018.8661093

Abstract

A community is a part of the graph which the connection in that part is denser than the rest of the graph. Community Detection refers to all algorithms and methods that attempt to find communities in a graph. Community detection helps data scientists to extract meaningful information from the network (graph) so that they can use them in their analyzations. Therefore, the existence of a method that can identify communities with high performance and quality is essential. In the new parallel programming framework which is based on map-reduce, a method is proposed which can reuse a working set of data across multiple parallel operations while retaining scalability and fault tolerance of map-reduce. This framework which is called Spark is currently one of the best ways to perform parallel computing on big data. Speaker-Listener Label Propagation Algorithm (SLPA) is one of the well-known algorithms in the study of overlapping community detection. This algorithm is very popular due to the low complexity of time, local computing, and the high quality of the identified communities. However, SLPA algorithm has a very high execution time for very large graphs and is even unusable in some cases. In this paper, we proposed an improved version of SLPA, called Parallel Advanced SLPA or PASLPA. First, we reduced the run-time of SLPA by making changes in the second phase of SLPA (Evolution). Secondly, we implemented PASLPA in parallel using apache spark framework.

Full Text