Abstract
We develop a principled methodology to infer assortative communities in networks based on a nonparametric Bayesian formulation of the planted partition model. We show that this approach succeeds in finding statistically significant assortative modules in networks, unlike alternatives such as modularity maximization, which systematically overfits both in artificial as well as in empirical examples. In addition, we show that our method is not subject to a resolution limit, and can uncover an arbitrarily large number of communities, as long as there is statistical evidence for them. Our formulation is amenable to model selection procedures, which allow us to compare it to more general approaches based on the stochastic block model, and in this way reveal whether assortativity is in fact the dominating large-scale mixing pattern. We perform this comparison with several empirical networks, and identify numerous cases where the network's assortativity is exaggerated by traditional community detection methods, and we show how a more faithful degree of assortativity can be identified.
Highlights
Community detection is one of the most central methods in network science [1,2], and it consists in the algorithmic partition of the nodes of a network into cohesive groups, according to a mathematical definition of this concept
Can uncover communities even when their number is unknown, without overfitting. We show that it does not suffer from the resolution limit present in other approaches, such as modularity maximization [19], and it can find an arbitrarily large number of communities, provided they are statistically significantly
We have described how to perform a nonparametric Bayesian inference of the planted partition generative model, resulting in a principled community detection algorithm tailored for assortative structures
Summary
Community detection is one of the most central methods in network science [1,2], and it consists in the algorithmic partition of the nodes of a network into cohesive groups, according to a mathematical definition of this concept (for which there are many). [20], the only statement that can be made is that there exists a particular choice of parameters λin, λout and θ such that maximizing modularity with the appropriate choice of γ and the PP likelihood conditioned on these parameters will yield the same partition Since these parameters are unknown in practice, and are in general inconsistent with maximum likelihood estimation, the relevance of this equivalence is arguably limited. Neither approach considered above, i.e., maximum likelihood inference of the PP model and modularity maximization, offers a robust method to uncover community structure in networks. We will demonstrate this problem with some simple examples, but before we do so we turn instead to a Bayesian approach, which includes the correct penalization of model complexity, and addresses the overfitting problem at its root, in a manner analogous to what has been done for the general SBM [15], as we describe in the session
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have