Abstract

Unbiased sampling of online social networks (OSNs) makes it possible to get accurate statistical properties of large-scale OSNs. However, the most used sampling methods, Breadth-First-Search (BFS) and Greedy, are known to be biased towards high degree nodes, yielding inaccurate statistical results. To give a general requirement for unbiased sampling, we model the crawling process as a Markov Chain and deduce a necessary and sufficient condition, which enables us to design various efficient unbiased sampling methods. To the best of our knowledge, we are among the first to give such a condition. Metropolis-Hastings Random Walk (MHRW) is an example which satisfies the condition. However, walkers in MHRW may stay at some low-degree nodes for a long time, resulting considerable self-loops on these nodes, which adversely affect the crawling efficiency. Based on the condition, a new unbiased sampling method, called USRS, is proposed to reduce the probabilities of self-loops. We use the dataset of Renren, the largest OSN in China, to evaluate the performance of USRS. The results have demonstrated that USRS generates unbiased samples with low self-loop probabilities, and achieves higher crawling efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call