Abstract

For clustering analysis on categorical data, the distance measurement between two objects often plays a very important role. However, most of the existing categorical distance metrics do not distinguish between nominal attributes and ordinal attributes. That is, these metrics do not explore the information contained in ordinal values, and ignore the order relationship among them. Therefore, this paper proposes a novel clustering algorithm, which uses a united framework to measure the distance between nominal attributes and ordinal attributes while distinguishing the different characteristics between them. The basic idea of the proposed method is that, the attribute value pairs with larger co-occurrence probability in the same cluster may have smaller distances. Therefore, the distances between different categories are dynamically evaluated based on the current cluster structure of the data samples. Subsequently, the distances and cluster relationship are alternately learned until convergence. Experimental results show that the proposed algorithm has better robustness and performance than the existing counterparts on different kinds of categorical data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call