Clustering Algorithm with Learnable Distance for Categorical Data with Nominal and Ordinal Attributes

Hong Jia,Weiwei Zhong

doi:10.1109/dsit55514.2022.9943828

Abstract

For clustering analysis on categorical data, the distance measurement between two objects often plays a very important role. However, most of the existing categorical distance metrics do not distinguish between nominal attributes and ordinal attributes. That is, these metrics do not explore the information contained in ordinal values, and ignore the order relationship among them. Therefore, this paper proposes a novel clustering algorithm, which uses a united framework to measure the distance between nominal attributes and ordinal attributes while distinguishing the different characteristics between them. The basic idea of the proposed method is that, the attribute value pairs with larger co-occurrence probability in the same cluster may have smaller distances. Therefore, the distances between different categories are dynamically evaluated based on the current cluster structure of the data samples. Subsequently, the distances and cluster relationship are alternately learned until convergence. Experimental results show that the proposed algorithm has better robustness and performance than the existing counterparts on different kinds of categorical data sets.

Full Text