A rough set based algorithm for updating the modes in categorical clustering.

Semeh Ben Salem,Sami Naouali,Zied Chtourou

doi:10.1007/s13042-021-01293-w

Abstract

The categorical clustering problem has attracted much attention especially in the last decades since many real world applications produce categorical data. The k-mode algorithm, proposed since 1998, and its multiple variants were widely used in this context. However, they suffer from a great limitation related to the update of the modes in each iteration. The mode in the last step of these algorithms is randomly selected although it is possible to identify many candidate ones. In this paper, a rough density mode selection method is proposed to identify the adequate modes among a list of candidate ones in each iteration of the k-modes. The proposed method, called Density Rough k-Modes (DRk-M) was experimented using real world datasets extracted from the UCI Machine Learning Repository, the Global Terrorism Database (GTD) and a set of collected Tweets. The DRk-M was also compared to many states of the art clustering methods and has shown great efficiency.

Full Text