On neighbourhood degree sequences of complex networks

Keith M Smith

doi:10.1038/s41598-019-44907-8

Abstract

Network topology is a fundamental aspect of network science that allows us to gather insights into the complicated relational architectures of the world we inhabit. We provide a first specific study of neighbourhood degree sequences in complex networks. We consider how to explicitly characterise important physical concepts such as similarity, heterogeneity and organization in these sequences, as well as updating the notion of hierarchical complexity to reflect previously unnoticed organizational principles. We also point out that neighbourhood degree sequences are related to a powerful subtree kernel for unlabeled graph classification. We study these newly defined sequence properties in a comprehensive array of graph models and over 200 real-world networks. We find that these indices are neither highly correlated with each other nor with classical network indices. Importantly, the sequences of a wide variety of real world networks are found to have greater similarity and organisation than is expected for networks of their given degree distributions. Notably, while biological, social and technological networks all showed consistently large neighbourhood similarity and organisation, hierarchical complexity was not a consistent feature of real world networks. Neighbourhood degree sequences are an interesting tool for describing unique and important characteristics of complex networks.

Highlights

MethodsWe study a benchmark dataset of 406 real world networks used in[47] from the Colorado Index of Complex Networks[48]
Contemplating the roles of components in natural and man-made systems, we begin to realise their diversity
We demonstrated a link between neighbourhood degree sequences and Weisfeiler-Lehman graph subtree kernels[25] which provide powerful graph learning results[26] based on long-standing graph isomorphism results[23]

Summary

Methods

We study a benchmark dataset of 406 real world networks used in[47] from the Colorado Index of Complex Networks[48] This includes 186 static networks of which just 3 overlap with the above (dolphin social network, Macaque cortex and the uni email network). The global clustering coefficient, C, measures the ratio of closed to open triples in the network. A triple is a path of length two, {(i, j), (j, k)}, where it is closed if (k, i) exists in the network and open otherwise It is a measure of network segregation. The characteristic path length, L, is the average of the shortest paths existing between all pairs of nodes in the network It is known as a measure of network integration. Modularity, Q, measures the propensity of nodes to form into highly connected communities which are less connected to the rest of the network[51]

Experiments

Limitations and Future

Findings

Conclusion