Comparing Graph Sampling Methods Based on the Number of Queries

Kenta Iwasaki,Kazuyuki Shudo

doi:10.1109/bdcloud.2018.00168

Abstract

Random walk-based graph sampling methods can effectively estimate feature values in large-scale social networks wherein the node IDs are unknown. Real social networks are sampled by repeatedly querying their APIs to acquire the lists of adjacent nodes. These queries can then become a bottleneck in the sampling process because nearly all social network services restrict the rate at which queries can be issued. However, most existing graph sampling studies have not focused on the number of queries but have instead compared methods based on sample size. Therefore, such graph sampling methods cannot be recommended for estimating feature values in actual social networks. This study presents an approach to assess graph sampling methods that focus on the number of queries. This describes the time taken by algorithms for typical random walk-based graph sampling methods, such as a simple random walk with re-weighting (SRW-rw), a non-backtracking random walk with re-weighting (NBRW-rw), and a Metropolis ? Hastings random walk (MHRW), which require queries. The graph sampling precision was then experimentally evaluated based on sample size and query number standards using actual social networks, and the types of changes that occur were observed.

Full Text