Abstract

Breadth First Search (BFS) is a widely used approach for sampling large graphs. However, it has been empirically observed that BFS sampling is biased toward high-degree nodes, which may strongly affect the measurement results. In this paper, we quantify and correct the degree bias of BFS. First, we consider a random graph RG(pk) with an arbitrary degree distribution pk. For this model, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction f of covered nodes. We also show that, for RG(pk), all commonly used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS) have exactly the same bias. Next, we propose a practical BFS-bias correction procedure that takes as input a collected BFS sample together with the fraction f. Our correction technique is exact (i.e., leads to unbiased estimation) for RG(pk). Furthermore, it performs well when applied to a broad range of Internet topologies and to two large BFS samples of Facebook and Orkut networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call