Abstract

Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter, $\lambda$, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks.

Highlights

  • Identifying groups of related entities in a network is a ubiquitous task across scientific disciplines

  • Common objective functions studied by theoretical computer scientists include normalized cut, sparsest cut, conductance, and edge expansion, all of which measure some version of the cut-to-size ratio for a single cluster in a graph

  • In our work we focus on a multiplicative scaling of the sparsest cut objective that we call the scaled sparsest cut: ψ (S ) = φ (S )/n = cut(S )/(|S ||S|), which is identical to sparsest cut in terms of multiplicative approximations

Read more

Summary

Introduction

Identifying groups of related entities in a network is a ubiquitous task across scientific disciplines. This task is often called graph clustering, or community detection, and can be used to find similar proteins in a protein-interaction network, group related organisms. Other standards of clustering quality put a greater emphasis on the internal density of clusters, such as the cluster deletion objective, which seeks to partition a graph into completely connected sets of nodes (cliques) by removing the fewest edges possible. Let G be an undirected and unweighted graph on n nodes V , with m edges E. Let ES denote the interior edge set of S

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.