Identification of the Optimal Number of Clusters in Textual Data

Tanjina Das,Srikanta Paitnaik,Smita Prava Mishra

doi:10.1007/978-981-16-4807-6_21

Identification of the Optimal Number of Clusters in Textual Data

Tanjina Das, Srikanta Paitnaik + Show 1 more

https://doi.org/10.1007/978-981-16-4807-6_21

Copy DOI

Publication Date: Jan 1, 2022

Citations: 1

Affiliation: Siksha O Anusandhan University

#Number Of Clusters For Data #Optimal Number Of Clusters + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

AbstractClustering is an unsupervised process of grouping unlabeled data on the basis of similarity. To perform clustering, without knowing the number of clusters for any data set, is not pragmatic. The proposed method is used to find out the optimal number of clusters for unclassified textual data with the help of internal cluster validation techniques. The internal indexing used for validation in this work are Silhouette index, Davies Bouldin index, and Calinski Harabasz score along with K-means clustering algorithm. The result obtained through this method is further validated by the Elbow method of finding the number of clusters.KeywordsClusteringUnsupervisedUnclassifiedInternal cluster validationIndexing

Full Text