Mining the Hidden Link Structure from Distribution Flows for a Spatial Social Network

Yanqiao Zheng,Xiaoqi Zhang,Xiaobing Zhao,Xinyue Ye,Qiwen Dai

doi:10.1155/2019/6902027

Abstract

This study aims at developing a non-(semi-)parametric method to extract the hidden network structure from the {0,1}-valued distribution flow data with missing observations on the links between nodes. Such an input data type widely exists in the studies of information propagation process, such as the rumor spreading through social media. In that case, a social network does exist as the media of the spreading process, but its link structure is completely unobservable; therefore, it is important to make inference of the structure (links) of the hidden network. Unlike the previous studies on this topic which only consider abstract networks, we believe that apart from the link structure, different social-economic features and different geographic locations of nodes can also play critical roles in shaping the spreading process, which has to be taken into account. To uncover the hidden link structure and its dependence on the external social-economic features of the node set, a multidimensional spatial social network model is constructed in this study with the spatial dimension large enough to account for all influential social-economic factors. Based on the spatial network, we propose a nonparametric mean-field equation to govern the rumor spreading process and apply the likelihood estimator to make inference of the unknown link structure from the observed rumor distribution flows. Our method turns out easily extendible to cover the class of block networks that are useful in most real applications. The method is tested through simulated data and demonstrated on a data set of rumor spreading on Twitter.

Highlights

Flow data has been widely studied by different disciplines [1,2,3,4,5,6]
The distribution flows are highly “context-dependent”, which means the social-economic factors behind every agent joining the spreading process might significantly affect the speed, extent, and coverage of spreading, suggesting a spatial social network to be uncovered from the distribution flows
Simulation-based estimation is efficient in dealing with the estimation of agent-based modelling/simulation/calibration (ABM) as it is often impossible to derive an analytic expression for the standard error functions in ABM setting; simulation can help generate an empirical version of the error function and facilitate the application of standard ordinary least square (OLS) and maximum likelihood (ML) estimation strategy

Summary

Introduction

Flow data has been widely studied by different disciplines [1,2,3,4,5,6]. Especially in recent years, the development of internet makes an increasing amount of flow data sets publicly available, among them new types of flows are emerging and attracted more and more attentions from scholars [7, 8].Unlike the physical movement, such as the trajectory of taxi, the information flow data, such as the time series of the retweet status of a class of tweet articles within a population, does not contain any trajectory-level information, because a user may tweet after he saw many friends had done so. A group of friends can contribute to the spreading of the tweet, and it becomes impossible to figure out which one is the real single source, neither is it possible to track the trajectory of retweeting. This flow data are no longer stored as a collection of well-defined trajectories; instead, they consist of a time series of distributions of a given kind of information within entire population. The distribution flows are highly “context-dependent”, which means the social-economic factors behind every agent joining the spreading process (such as the education, income, and the neighborhood) might significantly affect the speed, extent, and coverage of spreading, suggesting a spatial social network to be uncovered from the distribution flows

Objectives

Methods

Findings

Conclusion