Abstract

Mining complex data in the form of networks is of increasing interest in many scientific disciplines. Network communities correspond to densely connected subnetworks, and often represent key functional parts of real-world systems. This paper proposes the embedding-based Silhouette community detection (SCD), an approach for detecting communities, based on clustering of network node embeddings, i.e. real valued representations of nodes derived from their neighborhoods. We investigate the performance of the proposed SCD approach on 234 synthetic networks, as well as on a real-life social network. Even though SCD is not based on any form of modularity optimization, it performs comparably or better than state-of-the-art community detection algorithms, such as the InfoMap and Louvain. Further, we demonstrate that SCD’s outputs can be used along with domain ontologies in semantic subgroup discovery, yielding human-understandable explanations of communities detected in a real-life protein interaction network. Being embedding-based, SCD is widely applicable and can be tested out-of-the-box as part of many existing network learning and exploration pipelines.

Highlights

  • Mining complex data in the form of networks is of increasing interest in many scientific disciplines: social, biological, manufacturing and similar systems can be represented and analyzed using network-based approaches

  • We believe performance with respect to the mixing parameter determining LFR graphs is of crucial importance, as it offers insight into how the considered community detection algorithms behave when communities are more or less defined

  • We show the results of semantic subgroup discovery on the Affinome protein interaction network

Read more

Summary

Introduction

Mining complex data in the form of networks is of increasing interest in many scientific disciplines: social, biological, manufacturing and similar systems can be represented and analyzed using network-based approaches. Developed embeddings technology offers advancements in representation learning (Zhang et al 2018) from different data formats, including learning representations of network data, such as network node embeddings (Cai et al 2018) Even though such embeddings are commonly used for supervised learning, such as node classification and link prediction, less attention is devoted to the study of how the latent organization of a network can be automatically extracted from node embeddings in an unsupervised manner. Real-world complex networks are commonly investigated in terms of their mesoscale topological structure, such as communities (Harenberg et al 2014). SD searches for rules of the form TargetClass ← Explanation These rules are traditionally learned via coverage-based approaches (Fürnkranz et al 2012)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call