Abstract

Statistical analysis of network is an active research area and the literature counts a lot of papers concerned with network models and statistical analysis of networks. However, very few papers deal with missing data in network analysis and we reckon that, in practice, networks are often observed with missing values. In this paper we focus on the Stochastic Block Model with valued edges and consider a MCAR setting by assuming that every dyad (pair of nodes) is sampled identically and independently of the others with probability $\rho >0$. We prove that maximum likelihood estimators and its variational approximations are consistent and asymptotically normal in the presence of missing data as soon as the sampling probability $\rho $ satisfies $\rho \gg \log (n)/n$.

Highlights

  • For the last decade, statistical network analyses has been a very active research topic and the statistical modeling of networks has found many applications in social sciences and biology for example Aicher et al (2014), Barbillon et al (2015), Mariadassou et al (2010), Wasserman and Faust (1994) and Zachary (1977).Many random graphs models have been widely studied, either from a theoretical or an empirical point of view

  • In Celisse et al (2012), consistency of MLE and VE is proven but asymptotic normality requires that the estimators converges at rate at least n−1, which is not proven in the paper, some results were available for some particular cases

  • According to Equation (2.2), if the sampling design is missing completely at random (MCAR), maximising pθ,ψ(yo, z, r) or pθ,ψ(yo, r) in θ is equivalent to maximising pθ(yo) in θ, this corresponds to the ignorability notion defined in Rubin (1976)

Read more

Summary

Introduction

Statistical network analyses has been a very active research topic and the statistical modeling of networks has found many applications in social sciences and biology for example Aicher et al (2014), Barbillon et al (2015), Mariadassou et al (2010), Wasserman and Faust (1994) and Zachary (1977). In Celisse et al (2012), consistency of MLE and VE is proven but asymptotic normality requires that the estimators converges at rate at least n−1, which is not proven in the paper, some results were available for some particular cases (affiliation for example). There is a strong asymmetry between the presence of an edge and its absence: the lack of proof that an edge exists is taken as proof that the edge does not exist and edges with uncertain status are considered as non existent in the graph This is the strategy adopted in most sparse asymptotic settings where the density of edges goes to 0 asymptotically (Bickel et al, 2013). Technical lemmas and details of the proofs are available in the appendices

Stochastic Block Model
Missing data for SBM
Sampling design examples
Observed-likelihoods
Models and assumptions
Identifiability
Subexponential variables
Symmetry
Other definitions
Complete-observed model
Main result
Variational and Maximum Likelihood Estimates
ML estimator
Variational estimator
Log-likelihood ratios
High level view of the proof
Global control
Local control
Proof of the main result
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.