A Criterion for Deciding the Number of Clusters in a Dataset Based on Data Depth

Ishwar Baidari,Channamma Patil

doi:10.1142/s2196888820500232

Ishwar Baidari, Channamma Patil

Open Access

https://doi.org/10.1142/s2196888820500232

Copy DOI

Journal: Vietnam Journal of Computer Science	Publication Date: Jul 8, 2020
Citations: 3	License type: cc-by

Affiliation: Karnatak University

Abstract

Clustering is a key method in unsupervised learning with various applications in data mining, pattern recognition and intelligent information processing. However, the number of groups to be formed, usually notated as [Formula: see text] is a vital parameter for most of the existing clustering algorithms as their clustering results depend heavily on this parameter. The problem of finding the optimal [Formula: see text] value is very challenging. This paper proposes a novel idea for finding the correct number of groups in a dataset based on data depth. The idea is to avoid the traditional process of running the clustering algorithm over a dataset for [Formula: see text] times and further, finding the [Formula: see text] value for a dataset without setting any specific search range for [Formula: see text] parameter. We experiment with different indices, namely CH, KL, Silhouette, Gap, CSP and the proposed method on different real and synthetic datasets to estimate the correct number of groups in a dataset. The experimental results on real and synthetic datasets indicate good performance of the proposed method.

Full Text