Abstract
Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods.
Highlights
Community structures are significant features observed in many complex networks, meaning that the nodes in a network can be divided naturally into groups, within which connections are relatively dense but between which connections are much sparser
Active learning algorithm In this subsection, we present the idea of the proposed semisupervised component generation algorithm based on active learning
We carried out two types of experiments: one for testifying the ability of the semi-supervised community detection algorithm based on the must-link and cannot-link constraints, and the other for demonstrating the utility of the semi-supervised component-generation algorithm based on active learning
Summary
Community structures are significant features observed in many complex networks, meaning that the nodes in a network can be divided naturally into groups, within which connections are relatively dense but between which connections are much sparser. Methods based on random walk utilize the tendency of a random walker to identify community structures from networks, the walker tends to be trapped in communities rather than walks across community boundaries within a limited number of steps. Such methods have been applied in many applications successfully [31,32,33,34,35,36,37,38]. The Infohiermap (abbreviation for Hierarchical Infomap [36]) algorithm [37], which reveals the best hierarchical community structures in networks by finding the shortest multilevel descriptions of the random walker, and the PPC (acronym for Personalized PageRank Clustering) algorithm [38], which combines the random walks and the modularity to efficiently identify the community structures of networks, are two representatives of the state-of-the-art algorithms based on random walk
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.