Abstract

Community detection is a fundamental procedure in the analysis of network data. Despite decades of research, there is still no consensus on the definition of a community. To analytically test the realness of a candidate community in weighted networks, we present a general formulation from a significance testing perspective. In this new formulation, the edge-weight is modeled as a censored observation due to the noisy characteristics of real networks. In particular, the edge-weights of missing links are incorporated as well, which are specified to be zeros based on the assumption that they are truncated or unobserved. Thereafter, the community significance assessment issue is formulated as a two-sample test problem on censored data. More precisely, the Logrank test is employed to conduct the significance testing on two sets of augmented edge-weights: internal weight set and external weight set. The presented approach is evaluated on both weighted networks and un-weighted networks. The experimental results show that our method can outperform prior widely used evaluation metrics on the task of individual community validation.

Highlights

  • Community detection is a fundamental procedure in the analysis of network data

  • We calculate the Pearson’s correlation coefficient between two vectors, where each vector is composed of the validation index values on a set of identified communities

  • On the un-weighted networks, our method is significantly better than OSLOM and CCME according to the Bonferroni–Dunn test and the Nemenyi test

Read more

Summary

Introduction

Community detection is a fundamental procedure in the analysis of network data. Despite decades of research, there is still no consensus on the definition of a community. Numerous community detection algorithms have been developed from different ­perspectives[1,2,3,4] Despite these developments, the issue of deciding whether a derived community is real or not is far from being resolved. Several research efforts have been conducted to analytically assess the realness of one candidate community, such as O­ SLOM7,8 , ­ESSC9, ­DSC10, ­CCME11, and ­FOCS12 Among these methods, only OSLOM and CCME focus on validating a community in weighted networks. We formulate the community significance assessment problem in edge-weighted networks as a non-parametric two-sample test issue on censored data. We choose the popular Logrank ­test[16] to fulfill this task

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.