Real time enhanced random sampling of online social networks

Giannis Haralabopoulos,Ioannis Anagnostopoulos

doi:10.1016/j.jnca.2013.10.016

Abstract

Social graphs can be easily extracted from Online Social Networks (OSNs). However, as the size and evolution of this kind of networks increases over time, conventional sampling methods used to evaluate large graph information cannot accurately project network properties. Furthermore, in an attempt to deal with ever increasing access and possible malicious incidents (e.g. Denial of Services), OSNs introduce access limitations for their data, making the crawling/sampling process even harder. A novel approach on random sampling is proposed, considering both limitations set from OSNs and resources available. We evaluate our proposal with 4 different settings on 14 different test graphs, crawled directly from Twitter. Additionally, we test our methods in various graphs from Stanford Network Analysis Project Collection. Results show that every scenario needs a different approach. Conventional Random Node Sampling is better used for small sampling sizes, while Enhanced Random Node Sampling provides quicker and better results in larger samples. Still many questions arise from this work that can be considered as future research topics.

Full Text