Abstract
Community detection is a fundamental procedure in the analysis of network data. Despite decades of research, there is still no consensus on the definition of a community. To analytically test the realness of a candidate community in weighted networks, we present a general formulation from a significance testing perspective. In this new formulation, the edge-weight is modeled as a censored observation due to the noisy characteristics of real networks. In particular, the edge-weights of missing links are incorporated as well, which are specified to be zeros based on the assumption that they are truncated or unobserved. Thereafter, the community significance assessment issue is formulated as a two-sample test problem on censored data. More precisely, the Logrank test is employed to conduct the significance testing on two sets of augmented edge-weights: internal weight set and external weight set. The presented approach is evaluated on both weighted networks and un-weighted networks. The experimental results show that our method can outperform prior widely used evaluation metrics on the task of individual community validation.
Highlights
Community detection is a fundamental procedure in the analysis of network data
We calculate the Pearson’s correlation coefficient between two vectors, where each vector is composed of the validation index values on a set of identified communities
On the un-weighted networks, our method is significantly better than OSLOM and CCME according to the Bonferroni–Dunn test and the Nemenyi test
Summary
Community detection is a fundamental procedure in the analysis of network data. Despite decades of research, there is still no consensus on the definition of a community. Numerous community detection algorithms have been developed from different perspectives[1,2,3,4] Despite these developments, the issue of deciding whether a derived community is real or not is far from being resolved. Several research efforts have been conducted to analytically assess the realness of one candidate community, such as O SLOM7,8 , ESSC9, DSC10, CCME11, and FOCS12 Among these methods, only OSLOM and CCME focus on validating a community in weighted networks. We formulate the community significance assessment problem in edge-weighted networks as a non-parametric two-sample test issue on censored data. We choose the popular Logrank test[16] to fulfill this task
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.