Abstract

Nowadays, network sampling has become an indispensable premise and foundation for large-scale network analysis, and its effectiveness determines to a large extent the reliability and practicability of the subsequent network analysis results. In this paper, we propose a network sampling algorithm inspired by an epidemic spreading model named the contact process. The contact process is similar to the random walk process but different from it in two key points. First, at each time step, a randomly selected sampled node rather than the latest sampled node is responsible for recruiting a new node from its neighborhood. Second, the responsible node recruits one of its neighbor nodes with a probability inversely proportional to the degree of this neighbor node, instead of equal probability. Experiments on nine indiscriminately selected real-world networks show that our proposed sampling algorithm has a significant advantage in preserving two basic network properties, the degree distributions and clustering coefficient distributions of original networks, compared with seven classical sampling methods.

Highlights

  • Nowadays, network sampling has become an indispensable premise and foundation for large-scale network analysis, and its effectiveness determines to a large extent the reliability and practicability of the subsequent network analysis results

  • Blagus et al [3] empirically compared representative network sampling methods on real-world networks and concluded that breadth-first search (BFS) and Security and Communication Networks random walk with subgraph induction (RWI) sampling methods show the best overall performance in preserving the degree and clustering coefficient distribution of original networks

  • We propose a network sampling algorithm inspired by an epidemic spreading model, the contact process, and it is called contact process sampling (CPS)

Read more

Summary

Introduction

Network sampling has become an indispensable premise and foundation for large-scale network analysis, and its effectiveness determines to a large extent the reliability and practicability of the subsequent network analysis results. Is step repeats until the sampling size is reached; that is, |VS| ρ ∗ |V| Another key different point is that the random walk is memoryless and a visited node has a probability of being visited again in the future, while the forest fire and breadth-first search never include the repeated nodes. Metropolis–Hastings random walk (MHRW) is demonstrated to be a well-performed sampling method in the literature [5, 6] It achieves a uniform distribution of sampled nodes by the following transition probability:. If v is already infected, nothing happens; if v is susceptible, it gets infected In such a contact process, the fraction of infected nodes on a given network in an steady state is dependent on two critical factors: the aforementioned death rate p and the contact probability W(k), which is the probability that an infected node chooses a neighbor node of degree k to contact. Following the conclusion of Yang et al, node u chooses a neighbor node v to recruit into the sample set with probability pu,v k−v 1/􏽐w∈Γuk−w1, where kv is the degree of node v and Γu represents the set of u’s neighbor nodes. is recruitment step repeats until the sample set contains NS distinct nodes. en, we construct the final sample network with these sampled nodes and the links which connect any two of these sampled nodes in the original network

Datasets
Evaluation Measures
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call