Abstract
Many graph clustering quality functions suffer from a resolution limit, namely the inability to find small clusters in large graphs. So-called resolution-limit-free quality functions do not have this limit. This property was previously introduced for hard clustering, that is, graph partitioning. We investigate the resolution-limit-free property in the context of non-negative matrix factorization (NMF) for hard and soft graph clustering. To use NMF in the hard clustering setting, a common approach is to assign each node to its highest membership cluster. We show that in this case symmetric NMF is not resolution-limit free, but that it becomes so when hardness constraints are used as part of the optimization. The resulting function is strongly linked to the constant Potts model. In soft clustering, nodes can belong to more than one cluster, with varying degrees of membership. In this setting resolution-limit free turns out to be too strong a property. Therefore we introduce locality, which roughly states that changing one part of the graph does not affect the clustering of other parts of the graph. We argue that this is a desirable property, provide conditions under which NMF quality functions are local, and propose a novel class of local probabilistic NMF quality functions for soft graph clustering.
Highlights
Graph clustering, known as network community detection, is an important problem with real-life applications in diverse disciplines such as life and social sciences [1,2]
We focus on the resolution-limit-free property, a property of hard graph clustering, recently introduced by Traag, Van Dooren, and Nesterov [4]
III we show that hard clustering based on negative matrix factorization (NMF) in this way is, in general, not resolution-limit free
Summary
Known as network community detection, is an important problem with real-life applications in diverse disciplines such as life and social sciences [1,2]. In this paper we try to provide a contribution in this direction by studying desirable locality properties of quality functions for hard and soft graph clustering. We focus on the resolution-limit-free property, a property of hard graph clustering, recently introduced by Traag, Van Dooren, and Nesterov [4]. Our goal is to investigate resolution-limit freeness and other locality properties of non-negative matrix factorization (NMF) graph clustering quality functions. NMF [7,8] is a popular machine learning method initially used to learn the parts of objects, like human faces and text documents It finds two non-negative matrices whose product provides a good approximation to the input matrix. Introduce a novel class of probabilistic NMF quality functions that are local and do not suffer from a resolution limit
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have