Estimating the number of clusters in a dataset via consensus clustering

Ramazan Ünlü,Petros Xanthopoulos

doi:10.1016/j.eswa.2019.01.074

Ramazan Ünlü, Petros Xanthopoulos

https://doi.org/10.1016/j.eswa.2019.01.074

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In unsupervised learning, the problem of finding the appropriate number of clusters -usually notated as k- is very challenging. Its importance lies in the fact that k is a vital hyperparameter for the most clustering algorithms. One algorithmic approach for tacking this problem is to apply a certain clustering algorithm with various cluster configurations and decide to use the one that maximizes a certain internal validity measure. This is a promising and computationally efficient approach since the independent runs are parallelizable. In this paper, we attempt to improve over this estimation approach by incorporating a consensus clustering approach into k estimating scheme. The weighted consensus clustering scheme employs four different indices namely Silhouette (SH), Calinski–Harabasz (CH), Davies–Bouldin (DB), and Consensus (CI) indices to estimate the correct number of cluster. Computational experiments in a dataset with clusters ranging from 2 to 7 show the profound advantages of weighted consensus clustering for correctly finding k in comparison to individual clustering method (e.g, k-means) and simple consensus clustering.

Full Text