Abstract
Community detection is often used to understand the structure of large and complex networks. One of the most popular algorithms for uncovering community structure is the so-called Louvain algorithm. We show that this algorithm has a major defect that largely went unnoticed until now: the Louvain algorithm may yield arbitrarily badly connected communities. In the worst case, communities may even be disconnected, especially when running the algorithm iteratively. In our experimental analysis, we observe that up to 25% of the communities are badly connected and up to 16% are disconnected. To address this problem, we introduce the Leiden algorithm. We prove that the Leiden algorithm yields communities that are guaranteed to be connected. In addition, we prove that, when the Leiden algorithm is applied iteratively, it converges to a partition in which all subsets of all communities are locally optimally assigned. Furthermore, by relying on a fast local move approach, the Leiden algorithm runs faster than the Louvain algorithm. We demonstrate the performance of the Leiden algorithm for several benchmark and real-world networks. We find that the Leiden algorithm is faster than the Louvain algorithm and uncovers better partitions, in addition to providing explicit guarantees.
Highlights
In many complex networks, nodes cluster and form relatively dense groups—often called communities [1, 2].Such a modular structure is usually not known beforehand
We show that the Louvain algorithm has a major problem, for both modularity and Constant Potts Model (CPM)
We suggested that the Leiden algorithm is faster than the Louvain algorithm, because of the fast local move approach
Summary
Nodes cluster and form relatively dense groups—often called communities [1, 2] Such a modular structure is usually not known beforehand. One of the best-known methods for community detection is called modularity [3] This method tries to maximise the difference between the actual number of edges in a community and the expected number of such edges. Kc is the sum of the degrees of the nodes in community c and m is the total number of edges in the network This way of defining the expected number of edges is based on the so-called configuration model. We show that the Louvain algorithm has a major problem, for both modularity and CPM. We name our algorithm the Leiden algorithm, after the location of its authors
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.