Towards a Better Quality Metric for Graph Cluster Evaluation

Hélio Marcos Paz De Almeida ,Wagner Meira ,Dorgival Olavo Guedes Neto ,Mohammed J Zaki

doi:10.5753/jidm.2012.1460

Hélio Marcos Paz De Almeida , Wagner Meira + Show 2 more

https://doi.org/10.5753/jidm.2012.1460

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The process of discovering groups of similar vertices in a graph, known as graph clustering, has interesting applications in many dierent scenarios, such as marketing and recommendation systems. One of the most important aspects of graph clustering is the evaluation of cluster quality, which is important not only to measure the eectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given network. Many quality metrics for graph clustering evaluation exist, but the most popular ones have strong biases and structural inconsistencies that cause the quality of their results to be, at least, doubtful. Our studies showed that, while in general those popular quality metrics do a good job evaluating the external sparsity between clusters, they do poorly when evaluating the internal density of those clusters, ignoring essential information (such as a cluster's vertex count) or having its internal density component ignored in practice because of its computational cost. In this article, we propose a new method for evaluating the internal density of a given cluster, one that not only uses more complete information to evaluate that density, but also takes into consideration structural characteristics of the original graph. With our proposed method, the internal density of a cluster is evaluated in terms of the expected density of similar clusters in that same graph, in contrast to the traditional quality metrics available, where clusters from dierent graphs are compared by the same standards. We believe that, if used in conjunction with a good external sparsity evaluation metric, like conductance, this method will help to obtain better, more signi cant graph clustering evaluation results.

Full Text