Link Clustering with Extended Link Similarity and EQ Evaluation Division

Lan Huang,Yan Wang,Enrico Blanzieri,Chao Su,Guishen Wang,Rodrigo Huerta-Quintanilla

doi:10.1371/journal.pone.0066005

Lan Huang, Yan Wang + Show 4 more

Open Access

https://doi.org/10.1371/journal.pone.0066005

Copy DOI

Abstract

Link Clustering (LC) is a relatively new method for detecting overlapping communities in networks. The basic principle of LC is to derive a transform matrix whose elements are composed of the link similarity of neighbor links based on the Jaccard distance calculation; then it applies hierarchical clustering to the transform matrix and uses a measure of partition density on the resulting dendrogram to determine the cut level for best community detection. However, the original link clustering method does not consider the link similarity of non-neighbor links, and the partition density tends to divide the communities into many small communities. In this paper, an Extended Link Clustering method (ELC) for overlapping community detection is proposed. The improved method employs a new link similarity, Extended Link Similarity (ELS), to produce a denser transform matrix, and uses the maximum value of EQ (an extended measure of quality of modularity) as a means to optimally cut the dendrogram for better partitioning of the original network space. Since ELS uses more link information, the resulting transform matrix provides a superior basis for clustering and analysis. Further, using the EQ value to find the best level for the hierarchical clustering dendrogram division, we obtain communities that are more sensible and reasonable than the ones obtained by the partition density evaluation. Experimentation on five real-world networks and artificially-generated networks shows that the ELC method achieves higher EQ and In-group Proportion (IGP) values. Additionally, communities are more realistic than those generated by either of the original LC method or the classical CPM method.

Highlights

The need for community structure detection originates from the study of complex networks [1], [2], and aims to identify a system of sub-networks, whose nodes are tightly linked via the original network topology
Each figure is devoted to a single dataset and it is comprised of the transform matrices and dendrograms of Extended Link Clustering method (ELC) and Link Clustering (LC), the communities found by them and by Clique Percolation Method (CPM), and the corresponding values of EQ, PD, In-group Proportion (IGP), communities number (CN), cover rate (CR) and uncovered nodes (UN)
Dolphin Dataset Results From Figure 4(A) and Figure 4(C), we can see that the transform matrix generated from LC is unclear and not that informative, while the ELC transform matrix clearly represents a network divided into three big clusters

Summary

Introduction

The need for community structure detection originates from the study of complex networks [1], [2], and aims to identify a system of sub-networks (or communities), whose nodes are tightly linked via the original network topology. When the community structure of a network is already known, it can be represented as an attribute of the nodes, as in the case of artificially-generated networks [1,4,10]. This is true for some real-world networks used as testing benchmarks; for example Zachary’s karate club network [1,2,3,4] and US college football network [1,2,3,4]. When more than one community exists, the community structure can be disjoint (communities which have no nodes in common) such as in a social network representing exclusive social groupings by interest or background [1,2,3,4], hierarchical (one community includes the other) such as the hierarchical organization of modularity in metabolic networks [11], or overlapped (two communities may have some nodes in common) such as a large fraction of proteins belonging to several protein complexes simultaneously [12]

Methods

Results

Conclusion