Abstract

In a social network, small or large communities within the network play a major role in deciding the functionalities of the network. Despite of diverse definitions, communities in the network may be defined as the group of nodes that are more densely connected as compared to nodes outside the group. Revealing such hidden communities is one of the challenging research problems. A real world social network follows small world phenomena, which indicates that any two social entities can be reachable in a small number of steps. In this paper, nodes are mapped into communities based on the random walk in the network. However, uncovering communities in large-scale networks is a challenging task due to its unprecedented growth in the size of social networks. A good number of community detection algorithms based on random walk exist in literature. In addition, when large-scale social networks are being considered, these algorithms are observed to take considerably longer time. In this work, with an objective to improve the efficiency of algorithms, parallel programming framework like Map-Reduce has been considered for uncovering the hidden communities in social network. The proposed approach has been compared with some standard existing community detection algorithms for both synthetic and real-world datasets in order to examine its performance, and it is observed that the proposed algorithm is more efficient than the existing ones.

Highlights

  • In a real world, various categories of networks play different roles in the society for different purposes viz. social networks, which represents social interactions among human beings in society, citation networks that represent the articles of various authors published in the particular field and their associated citations in other papers, technological networks that represent the distribution of resources, biological networks that represent protein–protein interaction in the network, etc

  • It can be evaluated with the help of confusion matrix (CM), where each row corresponds to the community, present in the real partition and each column corresponds to the community, detected through the proposed algorithm

  • It may be inferred that the Community detection using small world phenomenon (CDSW) algorithm is significantly different from all other algorithms with respect to metrics considered for comparison

Read more

Summary

Introduction

Various categories of networks play different roles in the society for different purposes viz. social networks, which represents social interactions among human beings in society, citation networks that represent the articles of various authors published in the particular field and their associated citations in other papers, technological networks that represent the distribution of resources, biological networks that represent protein–protein interaction in the network, etc. Power-law degree distributions [7], small world networks [8], and community structures are some of the important properties observed in the social network. Real-world social networks are observed to follow the power-law in both degree-distribution and community size distribution [7]. In a small world network, for a fixed average degree, the average path length between pairs of the node in the network increases logarithmically with the increase in number of nodes or, in other words, small world network exhibits pure exponential growth with respect to walk-length in the network [10] These inherent properties of real-world networks make it difficult for graph mining. The subsequent sections of this paper is organized as follows: in Section 3, some preliminaries about community structure, small world network, power-law degree distribution and clustering coefficient has been discussed.

Related Work
Structural Definition of Community
Small World Phenomenon
Power Law Degree Distribution
Clustering Coefficient
Random Walk
Proposed Methodology
Metrics for Evaluation Performance
Datasets Used
Synthetic Dataset
Real World Datasets
Experimental Results
Threat to Validity
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call