Comparative analysis of selected algorithms for qualitative data clustering

Agnieszka Nowak Brzezińska,Jakub Królik

doi:10.1016/j.procs.2023.10.416

Abstract

Data clustering can apply to both numeric and categorical attributes. The numerical characteristics of data are all such values that can be written as digits. This work is not directly devoted to this category of attributes. The research's primary and most crucial goal is to compare the clustering processes, the quality of the obtained clusters, and the clustering results between the selected algorithms for clustering numerical sets (using coding techniques) and qualitative data sets. Design experiments were performed on three datasets. The algorithms used to group the observations into classes were the k-means algorithm and the kmodes algorithm. Three different distance determination measures were used in the case of the k-means algorithm - the Euclidean measure, the urban measure, and the Chebyshev measure. It can be concluded that for the structures of the three selected data sets, the k-modes algorithm presents better results of the clustering quality and thus more effectively distributes observations to classes. It also does it in a much shorter time than the k-means algorithm. The keyword used here, however, is dataset structure.

Full Text