Abstract
Latent community discovery that combines links and contents of a text-associated network has drawn more attention with the advance of social media. Most of the previous studies aim at detecting densely connected communities and are not able to identify general structures, e.g., bipartite structure. Several variants based on the stochastic block model are more flexible for exploring general structures by introducing link probabilities between communities. However, these variants cannot identify the degree distributions of real networks due to a lack of modeling of the differences among nodes, and they are not suitable for discovering communities in text-associated networks because they ignore the contents of nodes. In this paper, we propose a popularity-productivity stochastic block (PPSB) model by introducing two random variables, popularity and productivity, to model the differences among nodes in receiving links and producing links, respectively. This model has the flexibility of existing stochastic block models in discovering general community structures and inherits the richness of previous models that also exploit popularity and productivity in modeling the real scale-free networks with power law degree distributions. To incorporate the contents in text-associated networks, we propose a combined model which combines the PPSB model with a discriminative model that models the community memberships of nodes by their contents. We then develop expectation-maximization (EM) algorithms to infer the parameters in the two models. Experiments on synthetic and real networks have demonstrated that the proposed models can yield better performances than previous models, especially on networks with general structures.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have