Abstract

We present a Bayesian nonparametric Poisson factorization model for modeling dense network data with an unknown and potentially growing number of overlapping communities. The construction is based on completely random measures and allows the number of communities to either increase with the number of nodes at a specified logarithmic or polynomial rate, or be bounded. We develop asymptotics for the number and size of the communities of the network and derive a Markov chain Monte Carlo algorithm for targeting the exact posterior distribution for this model. The usefulness of the approach is illustrated on various real networks.

Highlights

  • Nonnegative matrix factorization (NMF) methods (Paatero and Tapper 1994; Lee and Seung 2001) aim to find a latent representation of a positive n × m matrix A as a sum of K nonnegative factors

  • We focus on the application to network analysis, where m = n and the n × n count matrix A, the adjacency matrix, represents the number of directed or undirected interactions between n individuals; the latent factors may be interpreted as latent and potentially overlapping communities (Ball et al 2011), such as sport team members or other social activities circles

  • Zhou et al (2012), Gopalan et al (2014) and Zhou (2015) proposed Bayesian nonparametric approaches that allow the number of latent factors to be estimated from the data, and to grow unboundedly with the size n of the matrix

Read more

Summary

Introduction

Nonnegative matrix factorization (NMF) methods (Paatero and Tapper 1994; Lee and Seung 2001) aim to find a latent representation of a positive n × m matrix A as a sum of K nonnegative factors. We consider binary data where the matrix represents the existence or absence of a directed or undirected link between individuals. Poisson factorization approaches require the user to set the number K of latent factors, which is typically assumed to be independent of the sample size n. To address this problem, Zhou et al (2012), Gopalan et al (2014) and Zhou (2015) proposed Bayesian nonparametric approaches that allow the number of latent factors to be estimated from the data, and to grow unboundedly with the size n of the matrix.

63 Page 2 of 24
Specific model
Related work
63 Page 4 of 24
General model
Specific case of the GGP
Marginal distribution and simulation
Posterior characterization
Slice sampler for posterior inference
Experiments
Synthetic datasets
Political blogs
Wikipedia topcast
63 Page 12 of 24
Deezer
Discussion
63 Page 14 of 24
Proofs of Section 3
63 Page 18 of 24
Upper bound
B Gibbs sampler
Gibbs sampler step 1 for weighted graph on observed entries
Gibbs sampler step 1 for unweighted graph on observed entries
Undirected graph
Weighted graph
Prediction
Proof for the Gibbs step 2
Sampling from the inhomogeneous CRM
Findings
63 Page 24 of 24
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.