Abstract

Exponential-family random graph models are probabilistic network models that are parametrized by sufficient statistics based on structural (i.e., graph-theoretic) properties. The ergm package for the R statistical computing environment is a collection of tools for the analysis of network data within an exponential-family random graph model framework. Many different network properties can be employed as sufficient statistics for exponential- family random graph models by using the model terms defined in the ergm package; this functionality can be expanded by the creation of packages that code for additional network statistics. Here, our focus is on the addition of statistics based on graphlets. Graphlets are classes of small, connected, induced subgraphs that can be used to describe the topological structure of a network. We introduce an R package called ergm.graphlets that enables the use of graphlet properties of a network within the ergm package of R. The ergm.graphlets package provides a complete list of model terms that allows to incorporate statistics of any 2-, 3-, 4- and 5-node graphlets into exponential-family random graph models. The new model terms of the ergm.graphlets package enable both exponential-family random graph modeling of global structural properties and investigation of relationships between node attributes (i.e., covariates) and local topologies around nodes.

Highlights

  • A graph is a representation of a set of objects and the relations among them

  • The degree of a node can likewise be generalized into a 73-dimensional vector in which each coordinate represents the number of graphlets that the node touches at a particular orbit; this vector is called the graphlet degree vector (GDV), or the graphlet signature of the node (Milenkovic and Przulj 2008)

  • While inference for complex, highly dependent systems is difficult under the best of conditions, the generative nature of the Exponential-family random graph models (ERGMs) framework allows us to assess the adequacy of our models by comparison to features of the original data; given that we have identified a model that is both sensible and that successfully regenerates the important properties of the observed network, we have a stronger basis for subsequent investigation than would be obtained from simple rejection of a null hypothesis

Read more

Summary

Introduction

A graph (or network ) is a representation of a set of objects and the relations among them. Graphlets are small, connected, and non-isomorphic induced subgraphs of a graph that have n ≥ 2 nodes (Przulj et al 2004). The first technique uses the number of occurrences of each graphlet in the graph: the 30-dimensional vector obtained by counting the occurrences of each order ≤ 5 graphlet in the network is used for describing the topological structure of the graph (Przulj et al 2004). The graphlet degree distribution is used in a similar manner to the degree distribution to evaluate the topological characteristics of the whole network. The degree of a node can likewise be generalized into a 73-dimensional vector (again for order ≤ 5 graphlets) in which each coordinate represents the number of graphlets that the node touches at a particular orbit; this vector is called the graphlet degree vector (GDV), or the graphlet signature of the node (Milenkovic and Przulj 2008). Various network models describe different rules for formation of edges; e.g., Erdos-Renyi random graph models ( known as Bernoulli graphs)

G10 G11
Exponential-family random graph modeling
Illustration
========================== Summary of model fit
Protein secondary structure network
Algorithms and implementation
E45 E46 E48 E47
G15 G16 G17 G18 G19 G20 G21 G22 G23 G24 G25 G26 G27 G28 G29
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.