Ensemble clustering for graphs: comparisons and applications

Valérie Poulin,François Théberge

doi:10.1007/s41109-019-0162-z

Abstract

We recently proposed a new ensemble clustering algorithm for graphs (ECG) based on the concept of consensus clustering. In this paper, we provide experimental evidence to the claim that ECG alleviates the well-known resolution limit issue, and that it leads to better stability of the partitions. We propose a community strength index based on ECG results to help quantify the presence of community structure in a graph. We perform a wide range of experiments both over synthetic and real graphs, showing the usefulness of ECG over a variety of problems. In particular, we consider measures based on node partitions as well as topological structure of the communities, and we apply ECG to community-aware anomaly detection. Finally, we show that ECG can be used in a semi-supervised context to zoom in on the sub-graph most closely associated with seed nodes.

Highlights

Most networks that arise in nature exhibit complex structure (Girvan and Newman 2002; Newman 2003) with subsets of nodes densely interconnected relative to the rest of the network, which we call communities or clusters
In a recent study (Yang et al 2016), several state-ofthe art algorithms implemented in the igraph (Csardi and Nepusz 2006) package were compared over a wide range of artificial networks generated via the LFR benchmark (Lancichinetti et al 2008) and some cluster comparison measures
We briefly describe the ensemble clustering algorithm for graphs (ECG) algorithm, the LFR benchmark and the cluster comparison measures used in the “Background knowledge” section

Summary

Introduction

Most networks that arise in nature exhibit complex structure (Girvan and Newman 2002; Newman 2003) with subsets of nodes densely interconnected relative to the rest of the network, which we call communities or clusters. Graph clustering aims at finding a partition of the nodes V = C1 ∪ . In a recent study (Yang et al 2016), several state-ofthe art algorithms implemented in the igraph (Csardi and Nepusz 2006) package were compared over a wide range of artificial networks generated via the LFR benchmark (Lancichinetti et al 2008) and some cluster comparison measures. For un-weighted graphs, we let w(e) = 1 for all e ∈ E. We use 1Cij (v) to denote the indicator function for v ∈ Cij

Methods

Results

Conclusion