Abstract

Online social networks (OSNs) have become important platforms for efficiently connecting people and promoting information dissemination, which is of great importance to our social life and society. However, due to privacy concerns, and access limitations, it is difficult to obtain the whole network of OSNs for analysis, so it is critical to have a representative subgraph. Yet due to the same reasons, we are in lack of the original network as the ground truth which poses great challenges on evaluating sampling methods on the performance on unbiasedness, let alone representativeness. Thus uniform sampling (UNI) [Gjoka et al. 2010] was proposed to obtain an unbiased nodal property distribution as of the original network to evaluate the degree of bias of other methods. Yet UNI sampling suffers from its low efficiency, and the representativeness and connectivity of the obtained subgraph, which is formed by the sampled nodes and connections between them, are rarely studied. We propose an adaptive UNI sampling (adpUNI) method to overcome previously mentioned disadvantages of UNI by dividing the userID space into several intervals, whose sampling probability adaptively changes based on its target rate. By adding its neighbors of the targeted node into the sample set (adpUNI+N), we can further improve the performance on sampling efficiency and obtain a more connective and representative subgraph. When applied to Sina Weibo and Twitter, our methods over-perform other classical methods on sampling efficiency, and always have a better performance on connectivity and representativeness than UNI sampling. And we also find that an unbiased sample doesn’t guarantee a more representative subgraph.

Highlights

  • Nowadays, complex networks are ubiquitous in our world, among which, online social networks (OSNs) play a crucial role in society by efficiently connecting people and promoting social interactions

  • These two examples are typical: one is quite sparse and heterogeneous over the whole userID space (Sina Weibo), the other is relatively denser and more uniform (Twitter). Both of them are directed networks, [54], [55] have shown that we can treat them as an undirected graphs when sampling, i.e., once there is a connection with any direction between two nodes, it will be regarded as a bidirectional edge, which is the case when we deal with both Sina Weibo and Twitter

  • In this article, we proposed some fast representative sampling methods when dealing with large-scale OSNs, which have significant improvement on sampling efficiency and performance based on the observation of heterogeneous userID space

Read more

Summary

Introduction

Complex networks are ubiquitous in our world, among which, online social networks (OSNs) play a crucial role in society by efficiently connecting people and promoting social interactions. By the end of 2018, as one of the most popular social micro-blog platforms in the world, Twitter has 321 million monthly active users [10]; for Facebook, this. OSNs are typical instances of complex networks, and after the emergence of Web 2.0, OSNs have become a free-of-cost and efficient mass medium where users can present themselves to and interact with a wider public [7] which goes beyond a simple communicating channel. It attracted great attention from users, researchers and policy makers

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call