Clustering Validation of CLARA and K-Means Using Silhouette &amp; DUNN Measures on Iris Dataset

Tanvi Gupta,Supriya P Panda

doi:10.1109/comitcon.2019.8862199

Abstract

This paper is regarding the comparison of two techniques; Clustering Large Applications (CLARA) clustering and K-Means clustering using popular Iris dataset. CLARA clustering and K-Means clustering are the two techniques of “partitioning based” clustering. One considers medoids using some random sample data to form a cluster whereas the other considers centroid (means) of the dataset to form a cluster. In this paper, Cluster plot, Silhouette plot and Dunn Index on Iris dataset are shown for both the techniques. These all are used for “cluster validation”. The “Silhouette Analysis” is the measurement of an approximated average distance among the clusters. The “Silhouette plot” is the measurement of the closeness of the points in one cluster to the neighboring clusters, whereas the other internal clustering validation measure is the DUNN Index; higher the “Dunn Index” better is the clustering. All these statistical analysis is done in R programming. The final outcome attains that the CLARA clustering stands better than the K-Means clustering.

Full Text