How many clusters exist? Answer via maximum clustering similarity implemented in R

Ahmed N Albatineh,Meredith L Wilcox,Bashar Zogheib,Magdalena Niewiadomska-Bugaj

doi:10.1080/24709360.2019.1615770

Abstract

Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

How many clusters exist? Answer via maximum clustering similarity implemented in R

Abstract

Talk to us

Similar Papers

More From: Biostatistics & Epidemiology

Lead the way for us

Similar Papers

Development and validation of consensus clustering-based framework for brain segmentation using resting fMRI.
Srikanth Ryali ... Weidong Cai
Journal of neuroscience methods | VOL. 240
Srikanth Ryali, et. al.Srikanth Ryali ... Weidong Cai
29 Nov 2014
Journal of neuroscience methods | VOL. 240

A New Algorithm for Fuzzy Clustering Handling Incomplete Dataset
Balkis Abidi ... Sadok Ben Yahia
International Journal on Artificial Intelligence Tools | VOL. 23
Balkis Abidi, et. al.Balkis Abidi ... Sadok Ben Yahia
01 Aug 2014
International Journal on Artificial Intelligence Tools | VOL. 23

Improving the Dynamic Clustering of Hyperspectral Data Based on the Integration of Swarm Optimization and Decision Analysis
Amin Alizadeh Naeini ... Mohammad Saadatseresht
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 7
Amin Alizadeh Naeini, et. al.Amin Alizadeh Naeini ... Mohammad Saadatseresht
01 Jun 2014
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 7

MCS: A Method for Finding the Number of Clusters
Ahmed N Albatineh ... Magdalena Niewiadomska-Bugaj
Journal of Classification | VOL. 28
Ahmed N Albatineh, et. al.Ahmed N Albatineh ... Magdalena Niewiadomska-Bugaj
17 Dec 2010
Journal of Classification | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

How many clusters exist? Answer via maximum clustering similarity implemented in R

Abstract

Talk to us

Similar Papers

More From: Biostatistics & Epidemiology