Instability of clustering metrics in overlapping community detection algorithms

Diego Kiedanski,Pablo Rodriguez-Bocca

doi:10.1109/clei53233.2021.9640094

Abstract

In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.

Full Text