Cluster validity profiles

Thomas A Bailey,Richard Dubes

doi:10.1016/0031-3203(82)90002-4

Abstract

The quantitative evaluation of clusters has lagged far behind the development of clustering algorithms. This paper introduces a new procedure, based on probability profiles, for judging the validity of clusters established from rank-order proximity data. Probability profiles furnish a comprehensive picture of the compactness and isolation of a cluster, scaled in probability units. Given a rank-order proximity matrix and a cluster to be examined, profiles compare the cluster's upper bounds on the best compactness and isolation indices one would expect in a randomly chosen graph. After reviewing the pertinent literature this paper explains the background from graph theory and cluster analysis needed to treat cluster validity. The probabilities and bounds needed to form cluster profiles are derived and strategies for using profiles are suggested. Special attention is given to the underlying probability models. Profiles are demonstrated on four artificially generated data sets, two of which have good hierarchical structure, and on data from a speaker recognition project. They reject spurious clusters and accept apparently valid clusters. Since profiles quantify the interaction between a cluster and its environment, they provide a much richer source of information on cluster structure than single-number indices proposed in the literature.

Full Text