Sampling of temporal networks: Methods and biases.

Luis E C Rocha,Petter Holme,Naoki Masuda

doi:10.1103/physreve.96.052302

Luis E C Rocha, Petter Holme + Show 1 more

Open Access

https://doi.org/10.1103/physreve.96.052302

Copy DOI

Journal: Physical review. E	Publication Date: Nov 1, 2017
Citations: 79	License type: cc-by

Affiliation: University of Namur, Karolinska Institutet

Abstract

Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example, human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect the network structure and thus caution is necessary to generalize results based on samples. In this paper, we study four sampling strategies applied to a variety of real-life temporal networks. We quantify the biases generated by each sampling strategy on a number of relevant statistics such as link activity, temporal paths and epidemic spread. We find that some biases are common in a variety of networks and statistics, but one strategy, uniform sampling of nodes, shows improved performance in most scenarios. Given the particularities of temporal network data and the variety of network structures, we recommend that the choice of sampling methods be problem oriented to minimize the potential biases for the specific research questions on hand. Our results help researchers to better design network data collection protocols and to understand the limitations of sampled temporal network data.

Highlights

Networks have been used to model the interactions and interdependencies between the parts of a system [1]
We focus on statistics that are typically used to characterize temporal activity, paths, and spreading processes and on data sets that are relevant to study human dynamics, epidemic and information spread, in different contexts
The high turnover of nodes in the sex-workers and their clients (SEX) network explains why the number of nodes falls more substantially in this case than in the email communication within a university (EMA) network if we reduce Ts

Summary

Introduction

Networks have been used to model the interactions and interdependencies between the parts of a system [1]. When modeling real systems as networks, researchers sample data by extracting the relevant information within a given temporal and spatial frame [3], trace-routing or snowballing from one or multiple sources [4,5], or by collecting all network-related information of a specific system, for example, email exchanges within a university or social interactions on a web community [2,6]. Sampling network data involves at least four main decisions: the choice of (i) the total observation, or sampling, time (e.g., 1 day or 1 year); (ii) which nodes and (iii) links will be observed (e.g., all or a fraction); and (iv) the temporal resolution, i.e., the time interval in which data are recorded. Temporal networks describe more realistically the temporal paths through which informa-

Methods

Results

Conclusion