Compressing Networks with Super Nodes

Natalie Stanley,Peter J. Mucha,Roland Kwitt,Marc Niethammer

doi:10.1038/s41598-018-29174-3

Natalie Stanley, Peter J. Mucha + Show 2 more

Open Access

https://doi.org/10.1038/s41598-018-29174-3

Copy DOI

Abstract

Community detection is a commonly used technique for identifying groups in a network based on similarities in connectivity patterns. To facilitate community detection in large networks, we recast the network as a smaller network of ‘super nodes’, where each super node comprises one or more nodes of the original network. We can then use this super node representation as the input into standard community detection algorithms. To define the seeds, or centers, of our super nodes, we apply the ‘CoreHD’ ranking, a technique applied in network dismantling and decycling problems. We test our approach through the analysis of two common methods for community detection: modularity maximization with the Louvain algorithm and maximum likelihood optimization for fitting a stochastic block model. Our results highlight that applying community detection to the compressed network of super nodes is significantly faster while successfully producing partitions that are more aligned with the local network connectivity and more stable across multiple (stochastic) runs within and between community detection algorithms, yet still overlap well with the results obtained using the full network.

Highlights

Networks appear across disciplines as natural data structures for modeling relational definitions between entities, such as regulatory interactions between genes and proteins, and social connections between people
Since the super node representation produces a weighted network, where the edge weights are counts computed based on the original network, both of these community detection algorithms are able to accommodate these kinds of edge weights
Creating a super node representation of the network with S = 600 leads to the maximum value of Normalized Mutual Information (NMI)(zFull, zSN) among all values of S we considered in our experiments

Summary

Introduction

Networks appear across disciplines as natural data structures for modeling relational definitions between entities, such as regulatory interactions between genes and proteins, and social connections between people. To speed up segmentation for large images, a popular approach is to avoid computing segmentations at the pixel level and instead reformulate the segmentation problem based on larger-scale image primitives that are likely part of the same partition This can be accomplished by super pixels that aggregate pixels together in a way that faithfully adheres to image boundaries, maintaining or improving segmentation accuracy[12]. Various authors have typically based the quality of their super pixel representation of the original image on two criteria They seek to minimize under segmentation error[13], which quantifies the extent to which the super pixels bleed across original boundaries in the image. For partitions zFull and zSN with p and q communities, respectively, with N the R × C contingency table matrix where Nij gives the count of the number of shared nodes in community i in zFull and community j in zSN, the NMI between the two partitions is NMI(z Full , zSN)

Objectives

Methods

Results

Conclusion