Abstract

We perform an extensive analysis of how sampling impacts the estimate of several relevant network measures. In particular, we focus on how a sampling strategy optimized to recover a particular spectral centrality measure impacts other topological quantities. Our goal is on one hand to extend the analysis of the behavior of TCEC (Ruggeri and De Bacco, in: Cherifi, Gaito, Mendes, Moro, Rocha (eds) Complex networks and their applications VIII, Springer, Cham, pp 90–101, 2020), a theoretically-grounded sampling method for eigenvector centrality estimation. On the other hand, to demonstrate more broadly how sampling can impact the estimation of relevant network properties like centrality measures different than the one aimed at optimizing, community structure and node attribute distribution. In addition, we analyze sampling behaviors in various instances of network generative models. Finally, we adapt the theoretical framework behind TCEC for the case of PageRank centrality and propose a sampling algorithm aimed at optimizing its estimation. We show that, while the theoretical derivation can be suitably adapted to cover this case, the resulting algorithm suffers of a high computational complexity that requires further approximations compared to the eigenvector centrality case. Main contributions (a) Extensive empirical analysis of the impact of the TCEC sampling method (optimized for eigenvector centrality recovery) on different centrality measures, community structure, node attributes and statistics related to specific network generative models; (b) extending TCEC to optimize PageRank estimation.

Highlights

  • When investigating real-world network datasets we often do not have access to the entire network information

  • Implementation details While we refer to Ruggeri and De Bacco (2020) for the detailed definitions of the parameters needed in the algorithmic implementation, we provide a summary of their values used in our experiments in the “Appendix 1”; we use the open-source implementation of Theoretical Criterion for Eigenvector Centrality (TCEC) available online

  • We investigated here the impact on various centrality measures, community structure, node attribute distribution and further statistics relevant to specific instances of network generative models that sampling techniques have

Read more

Summary

Introduction

When investigating real-world network datasets we often do not have access to the entire network information. In disassortative networks these likely candidates belong to different communities, the more homogeneous exploration. As the theoretical groundings behind the two are similar, we argue that using the L1norm in TCPR (see “Appendix 8”), which is inherently less discriminative of the L2-norm behind TCEC, seems to affect this difference in performance Another possible cause is the extra assumption of in-sample nodes’ degrees linearly scaling with sample size. Large deviations from this assumption could sensibly impact the quality of the goodness criterion at hand

Conclusions
Findings
E Aji E Ajh
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.