Abstract

The authors use four criteria to examine a novel community detection algorithm: (a) effectiveness in terms of producing high values of normalized mutual information (NMI) and modularity, using well-known social networks for testing; (b) examination, meaning the ability to examine mitigating resolution limit problems using NMI values and synthetic networks; (c) correctness, meaning the ability to identify useful community structure results in terms of NMI values and Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks; and (d) scalability, or the ability to produce comparable modularity values with fast execution times when working with large-scale real-world networks. In addition to describing a simple hierarchical arc-merging (HAM) algorithm that uses network topology information, we introduce rule-based arc-merging strategies for identifying community structures. Five well-studied social network datasets and eight sets of LFR benchmark networks were employed to validate the correctness of a ground-truth community, eight large-scale real-world complex networks were used to measure its efficiency, and two synthetic networks were used to determine its susceptibility to two resolution limit problems. Our experimental results indicate that the proposed HAM algorithm exhibited satisfactory performance efficiency, and that HAM-identified and ground-truth communities were comparable in terms of social and LFR benchmark networks, while mitigating resolution limit problems.

Highlights

  • Many real-world systems can be expressed as networks consisting of nodes connected by edges [1,2,3]

  • Network topology is represented as an adjacency matrix A = {aij} and aij 2 Rn, where aij = 1 if an edge eij exists between nodes i and j, otherwise aij = 0. wij = wji denotes the weight of an edge eij, where wij = 1 if nodes i and j in a network are identical and aij = 1, otherwise wij = 0

  • We used two well-studied methods to establish hierarchical arc-merging (HAM) identification accuracy and performance efficiency baselines that fit with the four criteria: the Louvain method, which has a reputation for dealing successfully with a network consisting of 1 billion edges using a PC machine [34], and the Infomap information theory-based method, based on its history of producing optimum normalized mutual information (NMI) results for LFR benchmark networks [45]

Read more

Summary

Introduction

Many real-world systems can be expressed as networks consisting of nodes connected by edges [1,2,3]. Nodes and edges respectively represent scientists and collaborations among scientists for published academic papers. Network topology is represented as an adjacency matrix A = {aij} and aij 2 Rn, where aij = 1 if an edge eij exists between nodes i and j, otherwise aij = 0. Wij = wji denotes the weight of an edge eij, where wij = 1 if nodes i and j in a network are identical and aij = 1, otherwise wij = 0. The most common approach for determining weight wij of an edge eij is to calculate the number of common neighbors—that is, wij = wji = Scn(i,j), as in (1). A high weight indicates a high degree of similarity and structural equivalence (i.e., connected nodes sharing large numbers of common neighbors). Scn can be extended to various similarity measures by dividing different denominator forms such as cosine similarity, the Jaccard index, and minimum similarity, respectively defined as

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call