Abstract

Due to the rapid development of various social networks, the spatial autoregressive (SAR) model is becoming an important tool in social network analysis. However, major bottlenecks remain in analyzing large-scale networks (e.g., Facebook has over 700 million active users), including computational scalability, estimation consistency, and proper network sampling. To address these challenges, we propose a novel least squares estimator (LSE) for analyzing large sparse networks based on the SAR model. Computationally, the LSE is linear in the network size, making it scalable to analysis of huge networks. In theory, the LSE is $\sqrt{n}$-consistent and asymptotically normal under certain regularity conditions. A new LSE-based network sampling technique is further developed, which can automatically adjust autocorrelation between sampled and unsampled units and hence guarantee valid statistical inferences. Moreover, we generalize the LSE approach for the classical SAR model to more complex networks associated with multiple sources of social interaction effect. Numerical results for simulated and real data are presented to illustrate performance of the LSE.

Highlights

  • We consider a network with n nodes

  • We develop a novel sampling scheme to cope with the least squares estimator (LSE) approach, and further show that the sampled data can lead to a consistent estimation for the spatial autoregressive (SAR) model

  • It would be intriguing to study the problem without the network sparsity assumption

Read more

Summary

Introduction

We consider a network with n nodes. An adjacency matrix A = (aij) ∈ Rn×n could be defined to describe the network structure. Huang et al (2018) proposed the pseudo likelihood estimate for SAR with random effects Because this is a likelihood-type method, complex matrix computation (e.g. log determinant) is needed. More efficient algorithms have been proposed (Barry and Pace, 1999; Smirnov and Anselin, 2001; LeSage and Pace, 2007) These methods usually rely on some stringent assumptions on In − ρW , which can hardly hold for real social network data. Better techniques for network sampling are needed to ensure consistent estimation of social interaction effect Motivated by these challenges, we propose a novel, fast and scalable estimation method for the SAR model.

Motivation
Least squares estimation
Asymptotic properties
New LSE-based scheme for sampling networks
Numerical studies
Performance of the LSE
Performance of the sample-LSE
Performance of the mLSE
Sina Weibo network analysis
Conclusion
Proof of Proposition 1 and Proposition 3
Proof of Proposition 2
Proof of Theorem 1
Proof of Theorem 2

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.