Efficient and Exact Sampling of Simple Graphs with Given Arbitrary Degree Sequence

Charo I Del Genio,Kevin E Bassler,Hyunju Kim,Zoltán Toroczkai,Fabio Rapallo

doi:10.1371/journal.pone.0010012

Charo I Del Genio, Kevin E Bassler + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0010012

Copy DOI

Journal: PLoS ONE	Publication Date: Apr 8, 2010
Citations: 158	License type: CC BY 4.0

Affiliation: University of Houston, University of Notre Dame

Abstract

Uniform sampling from graphical realizations of a given degree sequence is a fundamental component in simulation-based measurements of network observables, with applications ranging from epidemics, through social networks to Internet modeling. Existing graph sampling methods are either link-swap based (Markov-Chain Monte Carlo algorithms) or stub-matching based (the Configuration Model). Both types are ill-controlled, with typically unknown mixing times for link-swap methods and uncontrolled rejections for the Configuration Model. Here we propose an efficient, polynomial time algorithm that generates statistically independent graph samples with a given, arbitrary, degree sequence. The algorithm provides a weight associated with each sample, allowing the observable to be measured either uniformly over the graph ensemble, or, alternatively, with a desired distribution. Unlike other algorithms, this method always produces a sample, without back-tracking or rejections. Using a central limit theorem-based reasoning, we argue, that for large , and for degree sequences admitting many realizations, the sample weights are expected to have a lognormal distribution. As examples, we apply our algorithm to generate networks with degree sequences drawn from power-law distributions and from binomial distributions.

Highlights

Network representation has become an increasingly widespread methodology of analysis to gain insight into the behavior of complex systems, ranging from gene regulatory networks to human infrastructures such as the Internet, power-grids and airline transportation, through metabolism, epidemics and social sciences [1,2,3,4]
A statistical mechanics approach [1] can be employed to characterize the collective properties of the system emerging from its node level properties
Each time the procedure is repeated, the degree sequence D considered is the ‘‘residual degree sequence’’, that is the original degree sequence reduced by the links that have previously been made, and with any zero residual degree node removed from the sequence

Summary

Introduction

Network representation has become an increasingly widespread methodology of analysis to gain insight into the behavior of complex systems, ranging from gene regulatory networks to human infrastructures such as the Internet, power-grids and airline transportation, through metabolism, epidemics and social sciences [1,2,3,4] These studies are primarily data driven, where connectivity information is collected, and the structural properties of the resulting graphs are analyzed for modeling purposes. Epidemiologists are faced with constructing a typical contact graph having the observed degree sequence, on which disease spread scenarios can be tested Another reason for studying classes or ensembles of graphs obeying constraints comes from the fact that the network structure of many large-scale real-world systems is not the result of a global design, but of complex dynamical processes with many stochastic elements. We focus on the degree as a node characteristic, which could represent, for example, the number of friends of a person, the valence of an atom in a chemical compound, the number of clients of a router, etc

Methods

Results

Conclusion