A cautionary note on using internal cross validation to select the number of clusters

Abba M Krieger,Paul E Green

doi:10.1007/bf02294300

A cautionary note on using internal cross validation to select the number of clusters

Abba M Krieger, Paul E Green

https://doi.org/10.1007/bf02294300

Copy DOI

Journal: Psychometrika	Publication Date: Sep 1, 1999
Citations: 39

Affiliation: University of Pennsylvania

#Appropriate Number Of Clusters #Rand Measure + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

A highly popular method for examining the stability of a data clustering is to split the data into two parts, cluster the observations in Part A, assign the objects in Part B to their nearest centroid in Part A, and then independently cluster the Part B objects. One then examines how close the two partitions are (say, by the Rand measure). Another proposal is to split the data into k parts, and see how their centroids cluster. By means of synthetic data analyses, we demonstrate that these approaches fail to identify the appropriate number of clusters, particularly as sample size becomes large and the variables exhibit higher correlations.

Full Text