Abstract

We recently proposed a new ensemble clustering algorithm for graphs (ECG) based on the concept of consensus clustering. In this paper, we provide experimental evidence to the claim that ECG alleviates the well-known resolution limit issue, and that it leads to better stability of the partitions. We propose a community strength index based on ECG results to help quantify the presence of community structure in a graph. We perform a wide range of experiments both over synthetic and real graphs, showing the usefulness of ECG over a variety of problems. In particular, we consider measures based on node partitions as well as topological structure of the communities, and we apply ECG to community-aware anomaly detection. Finally, we show that ECG can be used in a semi-supervised context to zoom in on the sub-graph most closely associated with seed nodes.

Highlights

  • Most networks that arise in nature exhibit complex structure (Girvan and Newman 2002; Newman 2003) with subsets of nodes densely interconnected relative to the rest of the network, which we call communities or clusters

  • In a recent study (Yang et al 2016), several state-ofthe art algorithms implemented in the igraph (Csardi and Nepusz 2006) package were compared over a wide range of artificial networks generated via the LFR benchmark (Lancichinetti et al 2008) and some cluster comparison measures

  • We briefly describe the ensemble clustering algorithm for graphs (ECG) algorithm, the LFR benchmark and the cluster comparison measures used in the “Background knowledge” section

Read more

Summary

Introduction

Most networks that arise in nature exhibit complex structure (Girvan and Newman 2002; Newman 2003) with subsets of nodes densely interconnected relative to the rest of the network, which we call communities or clusters. Graph clustering aims at finding a partition of the nodes V = C1 ∪ . In a recent study (Yang et al 2016), several state-ofthe art algorithms implemented in the igraph (Csardi and Nepusz 2006) package were compared over a wide range of artificial networks generated via the LFR benchmark (Lancichinetti et al 2008) and some cluster comparison measures. For un-weighted graphs, we let w(e) = 1 for all e ∈ E. We use 1Cij (v) to denote the indicator function for v ∈ Cij

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call