Abstract

The quantitative evaluation of clusters has lagged far behind the development of clustering algorithms. This paper introduces a new procedure, based on probability profiles, for judging the validity of clusters established from rank-order proximity data. Probability profiles furnish a comprehensive picture of the compactness and isolation of a cluster, scaled in probability units. Given a rank-order proximity matrix and a cluster to be examined, profiles compare the cluster's upper bounds on the best compactness and isolation indices one would expect in a randomly chosen graph. After reviewing the pertinent literature this paper explains the background from graph theory and cluster analysis needed to treat cluster validity. The probabilities and bounds needed to form cluster profiles are derived and strategies for using profiles are suggested. Special attention is given to the underlying probability models. Profiles are demonstrated on four artificially generated data sets, two of which have good hierarchical structure, and on data from a speaker recognition project. They reject spurious clusters and accept apparently valid clusters. Since profiles quantify the interaction between a cluster and its environment, they provide a much richer source of information on cluster structure than single-number indices proposed in the literature.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.