Abstract

This paper advances the high dimensional frontier for network clustering. In the high dimensional Stochastic Blockmodel for a random network, the number of clusters (or blocks) K grows with the number of nodes N . Previous authors have studied the statistical estimation performance of spectral clustering and the maximum likelihood estimator under the high dimensional model. These authors do not allow K to grow faster than N1/2. We study a model where, ignoring log terms, K can grow proportionally to N . Since the number of clusters must be smaller than the number of nodes, no reasonable model allows K to grow faster; thus, our asymptotic results are the “highest” dimensional. To push the asymptotic setting to this extreme, we make additional assumptions that are motivated from empirical observations in physical anthropology [1], and an in depth study of massive empirical networks[2]. Furthermore, we develop a regularized maximum likelihood estimator that performs well in the highest dimensional model. We prove that, under certain conditions, the proportion of nodes that the regularized estimator misclusters converges to zero. This is the first paper to explicitly introduce and demonstrate the advantages of statistical regularization in a parametric form for network analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.