Abstract

In many domains it is necessary to generate surrogate networks, e.g., for hypothesis testing of different properties of a network. Generating surrogate networks typically requires that different properties of the network are preserved, e.g., edges may not be added or deleted and edge weights may be restricted to certain intervals. In this paper we present an efficient property-preserving Markov chain Monte Carlo method termed CycleSampler for generating surrogate networks in which (1) edge weights are constrained to intervals and vertex strengths are preserved exactly, and (2) edge and vertex strengths are both constrained to intervals. These two types of constraints cover a wide variety of practical use cases. The method is applicable to both undirected and directed graphs. We empirically demonstrate the efficiency of the CycleSampler method on real-world data sets. We provide an implementation of CycleSampler in R, with parts implemented in C.

Highlights

  • In many applications it is useful to represent relationships between objects with a network in which vertices correspond to objects of interest and associations between objects are expressed with directed or undirected edges

  • We provide an open-source implementation of the CycleSampler method

  • Many such networks are often sparse and large, and generating surrogate networks adhering to specific constraints is a difficult problem

Read more

Summary

Introduction

In many applications it is useful to represent relationships between objects with a network in which vertices correspond to objects of interest and associations between objects are expressed with directed or undirected edges. Given such a network, one might be interested in questions such as community detection [1], clustering coefficients [2,3], centrality measures [4], shortest path distributions [5], or different measures of information propagation [6]. It is often useful to study whether a possibly interesting finding from a given network reflects a real phenomenon, or if it is merely caused by, e.g., noise or systematic errors. A simple approach to this is to compare the original finding to findings from surrogate networks that share some relevant properties with the original network but are otherwise inherently “random.” For example, communities found in the original network should probably exhibit greater structure than communities in appropriately randomized networks. Usual solutions involve generating a number of surrogate networks by fixing some network properties of interest and drawing a uniform sample of surrogate networks from the set of all networks satisfying the given properties

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.