Clustering is a complex unsupervised method used to group most similar observations of a given dataset within the same cluster. To guarantee high efficiency, the clustering process should ensure high accuracy and low complexity. Many clustering methods were developed in various fields depending on the type of application and the data type considered. Categorical clustering considers segmenting a dataset in which the data are categorical and were widely used in many real-world applications. Thus several methods were developed including hard, fuzzy and rough set-based methods. In this survey, more than 30 categorical clustering algorithms were investigated. These methods were classified into hierarchical and partitional clustering methods and classified in terms of their accuracy, precision and recall to identify the most prominent ones. Experimental results show that rough set-based clustering methods provided better efficiency than hard and fuzzy methods. Besides, methods based on the initialization of the centroids also provided good results.
Read full abstract