Estimating the number of clusters using diversity

Suneel Kumar Kingrani,Mark Levene,Dell Zhang

doi:10.5430/air.v7n1p15

Suneel Kumar Kingrani, Mark Levene + Show 1 more

Open Access

https://doi.org/10.5430/air.v7n1p15

Copy DOI

Abstract

It is an important and challenging problem in unsupervised learning to estimate the number of clusters in a dataset. Knowing the number of clusters is a prerequisite for many commonly used clustering algorithms such as \textit{k}-means. In this paper, we propose a novel diversity based approach to this problem. Specifically, we show that the difference between the global diversity of clusters and the sum of each cluster’s local diversity of their members can be used as an effective indicator of the optimality of the number of clusters, where the diversity is measured by Rao’s quadratic entropy. A notable advantage of our proposed method is that it encourages balanced clustering by taking into account both the sizes of clusters and the distances between clusters. In other words, it is less prone to very small “outlier” clusters than existing methods. Our extensive experiments on both synthetic and real-world datasets (with known ground-truth clustering) have demonstrated that our proposed method is robust for clusters of different sizes, variances, and shapes, and it is more accurate than existing methods (including elbow, Caliński-Harabasz, silhouette, and gap-statistic) in terms of finding out the optimal number of clusters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Estimating the number of clusters using diversity

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence Research

Lead the way for us

Journal: Artificial Intelligence Research	Publication Date: Dec 18, 2017
Citations: 40

Similar Papers

Determination of Different Sizes of Partitioning Clusters in a Highly Connected Graph
Kittichai Lavangnananda ... Chidchanok Panyarit
-
Kittichai Lavangnananda, et. al.Kittichai Lavangnananda ... Chidchanok Panyarit
01 Jan 2019
01 Jan 2019

Improving the Dynamic Clustering of Hyperspectral Data Based on the Integration of Swarm Optimization and Decision Analysis
Amin Alizadeh Naeini ... Mohammad Saadatseresht
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 7
Amin Alizadeh Naeini, et. al.Amin Alizadeh Naeini ... Mohammad Saadatseresht
01 Jun 2014
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 7

K-normal: An Improved K-means for Dealing with Clusters of Different Sizes
Yonggang Lu ... Xiaochun Wang
-
Yonggang Lu, et. al.Yonggang Lu ... Xiaochun Wang
01 Jan 2017
01 Jan 2017

Objectively Determining the Number of Similar Hydrographic Clusters with Unsupervised Machine Learning
Carola Trahms ... Arne Biastoch
-
Carola Trahms, et. al.Carola Trahms ... Arne Biastoch
15 May 2023
15 May 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimating the number of clusters using diversity

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence Research