Abstract

Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods.

Highlights

  • Community structures are significant features observed in many complex networks, meaning that the nodes in a network can be divided naturally into groups, within which connections are relatively dense but between which connections are much sparser

  • Active learning algorithm In this subsection, we present the idea of the proposed semisupervised component generation algorithm based on active learning

  • We carried out two types of experiments: one for testifying the ability of the semi-supervised community detection algorithm based on the must-link and cannot-link constraints, and the other for demonstrating the utility of the semi-supervised component-generation algorithm based on active learning

Read more

Summary

Introduction

Community structures are significant features observed in many complex networks, meaning that the nodes in a network can be divided naturally into groups, within which connections are relatively dense but between which connections are much sparser. Methods based on random walk utilize the tendency of a random walker to identify community structures from networks, the walker tends to be trapped in communities rather than walks across community boundaries within a limited number of steps. Such methods have been applied in many applications successfully [31,32,33,34,35,36,37,38]. The Infohiermap (abbreviation for Hierarchical Infomap [36]) algorithm [37], which reveals the best hierarchical community structures in networks by finding the shortest multilevel descriptions of the random walker, and the PPC (acronym for Personalized PageRank Clustering) algorithm [38], which combines the random walks and the modularity to efficiently identify the community structures of networks, are two representatives of the state-of-the-art algorithms based on random walk

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.