Abstract

We provide a systematic approach to validate the results of clustering methods on weighted networks, in particular for the cases where the existence of a community structure is unknown. Our validation of clustering comprises a set of criteria for assessing their significance and stability. To test for cluster significance, we introduce a set of community scoring functions adapted to weighted networks, and systematically compare their values to those of a suitable null model. For this we propose a switching model to produce randomized graphs with weighted edges while maintaining the degree distribution constant. To test for cluster stability, we introduce a non parametric bootstrap method combined with similarity metrics derived from information theory and combinatorics. In order to assess the effectiveness of our clustering quality evaluation methods, we test them on synthetically generated weighted networks with a ground truth community structure of varying strength based on the stochastic block model construction. When applying the proposed methods to these synthetic ground truth networks’ clusters, as well as to other weighted networks with known community structure, these correctly identify the best performing algorithms, which suggests their adequacy for cases where the clustering structure is not known. We test our clustering validation methods on a varied collection of well known clustering algorithms applied to the synthetically generated networks and to several real world weighted networks. All our clustering validation methods are implemented in R, and will be released in the upcoming package clustAnalytics.

Highlights

  • Clustering of networks is a popular research field, and a wide variety of algorithms have been proposed over the years

  • To obtain a weighted stochastic block model (SBM) (WSBM) graph, we propose a variation of the model which produces multigraphs, which can be converted into weighted graphs by setting all edge weights as their corresponding edge count

  • As explained in the Materials and Methods section, to test for cluster significance of a given clustering algorithm, we apply the scoring functions defined in ‘‘Community scoring functions’’ to the clustering produced on the original graph and on randomized versions obtained by the method described in ‘‘Randomized graph’’

Read more

Summary

Introduction

Clustering of networks is a popular research field, and a wide variety of algorithms have been proposed over the years. Determining how meaningful the results are can often be difficult, as well as choosing which algorithm better suits a particular data set. This paper focuses on weighted networks (that is, those in which the connections between nodes have an assigned numerical value representing some property of the data), and we propose novel methods to validate the community partitions of these networks obtained by any given clustering algorithm. Our clustering validation methods focus on two of the most important aspects of cluster assessment: the significance and the stability of the resulting clusters. How to cite this article Arratia A, Renedo Mirambell M.

Objectives
Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.