A Complementary Optimization Procedure for Final Cluster Analysis of Clustering Categorical Data

Ali Seman,Azizian Mohd Sapawi

doi:10.1007/978-3-030-33585-4_30

Abstract

Clustering analysis has become an indispensable tool for obtaining and analyzing meaningful groups, irrespective of any numerical or categorical clustering problems. Algorithms such as fuzzy k-Modes, New fuzzy k-Modes, k-AMH, and the extended k-AMH algorithms such as Nk-AMH I, II, and III are usually employed to improve clustering of categorical problems. However, the performance of these algorithms is measured and evaluated according to the average accuracy scores taken from 100-run experiments, which require labeled data. Thus, the performance of the algorithms on unlabeled data cannot be measured explicitly. This paper extends complementary optimization procedures on the k-AMH model, known as Ck-AMH I, II, III, and IV, to obtain final and optimal clustering results. In experiments conducted, the complementary procedures produced optimal clustering results when tested on five categorical datasets: Soybean, Zoo, Hepatitis, Voting, and Breast. The optimal accuracy scores obtained were marginally lower than the maximum accuracy scores and, in some cases, were identical to the maximum accuracy scores obtained from the 100-run experiments. Consequently, using the complementary procedures, these clustering algorithms can be further developed as workbench clustering tools to cluster both unlabeled categorical and unlabeled numerical data.

Full Text