Abstract

Network models are applied in numerous domains where data arise from systems of interactions among pairs of actors. Both statistical and mechanistic network models are increasingly capable of capturing various dependencies among these actors. Yet, these dependencies pose statistical challenges for analyzing such data, especially when the data set comprises only a single observation of one network, often leading to intractable likelihoods regardless of the modeling paradigm and limiting the application of existing statistical methods for networks. We explore a subsampling bootstrap procedure to serve as the basis for goodness of fit and model selection with a single observed network that circumvents the intractability of such likelihoods. Our approach is based on flexible resampling distributions formed from the single observed network, allowing for more nuanced and higher dimensional comparisons than point estimates of quantities of interest. We include worked examples for model selection, with simulation, and assessment of goodness of fit, with duplication-divergence model fits for yeast (S.cerevisiae) protein-protein interaction data from the literature. The proposed approach produces a flexible resampling distribution that can be based on any network statistics of one’s choosing and can be employed for both statistical and mechanistic network models.

Highlights

  • Networks are used to represent data from systems composed of interactions among pairs of actors[1,2,3,4,5]

  • Statistical models are probabilistic models that specify the likelihood of observing any given network[7,8,9]. One example of these models is the family of exponential random graph models (ERGMs)[4], which uses observable network configurations as the natural sufficient statistics

  • While they are easy to sample from, a mechanistic model allows for numerous paths that can be taken in the state space to produce any one observed network, making the likelihood intractable

Read more

Summary

Introduction

Networks are used to represent data from systems composed of interactions among pairs of actors (represented by nodes)[1,2,3,4,5]. The different resampling methods operate differently, www.nature.com/scientificreports they all serve to create new data sets from a single observed data set that mimic the behavior of the original one to serve as a basis for statistical procedures This is an attractive option for networks, especially if there is only a single observed network, such as the Internet or the World Wide Web. Having multiple resampled networks that resemble, in some ways, the original observed network allows one to bypass the problem of unwieldy or intractable likelihoods. These methods generally work by drawing network realizations from the fitted model, and assessing fit by comparing the value of a set of network statistics for the observed network to the distribution of these statistics in the generated draws This resampling scheme is akin to that of the parametric bootstrap. Ohara et al.[27], Bhattacharyya et al.[28], Thompson et al.[30], and Gel et al.[31] are aimed at estimation and uncertainty quantification of network centrality, distribution of subgraphs, and functionals of the degree distribution, while Ali et al.[29] is a subgraph-based method for network comparisons

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call