Abstract
Categorical data clustering has been attracted a lot of attention recently due to its necessary in the real-world applications. Many clustering methods have been proposed for categorical data. However, most of the existing algorithms require the predefined number of clusters which is usually unavailable in real-world problems. Only a few works focused on automatic clustering, but mainly handled for numerical data. This study develops a novel automatic fuzzy clustering using non-dominated sorting particle swarm optimization (AFC-NSPSO) algorithm for categorical data. The proposed AFC-NSPSO algorithm can automatically identify the optimal number of clusters and exploit the clustering result with the corresponding selected number of clusters. In addition, a new technique is investigated to identify the maximum number of clusters in a dataset based on the local density. To select a final solution in the first Pareto front, some internal validation indices are used. The performance of the proposed AFC-NSPSO on the real-world datasets collected from the UCI machine learning repository exhibits effectiveness compared with some other existing automatic categorical clustering algorithms. Besides, this study also applies the proposed algorithm to analyze a real-world case study with an unknown number of clusters.
Highlights
Clustering is a popular technique which partitions a dataset into multiple distinct clusters based on the similarity or dissimilarity measure to exploit the structure of dataset
Few studies handled the automatic clustering for categorical data such as: automatic top-down clustering (AT-DC) [22], Best-K Plot method (BKPlot) [23], categorical data clustering with automatic selection of cluster number [24], divisive hierarchical clustering of categorical data [25], projected clustering for categorical data (PROCAD) [26], and multi-objective clustering based on sequential games (MOCSG) [27]
Motivated by the aforementioned issues, this study focuses on developing a novel automatic fuzzy clustering algorithm based non-dominated sorting particle swarm optimization for categorical data
Summary
Clustering is a popular technique which partitions a dataset into multiple distinct clusters based on the similarity or dissimilarity measure to exploit the structure of dataset. Few studies handled the automatic clustering for categorical data such as: automatic top-down clustering (AT-DC) [22], Best-K Plot method (BKPlot) [23], categorical data clustering with automatic selection of cluster number [24], divisive hierarchical clustering of categorical data [25], projected clustering for categorical data (PROCAD) [26], and multi-objective clustering based on sequential games (MOCSG) [27] These studies will be reviewed elaborately in the following section. Motivated by the aforementioned issues, this study focuses on developing a novel automatic fuzzy clustering algorithm based non-dominated sorting particle swarm optimization (abbreviated as AFC-NSPSO) for categorical data. The clustering performance of the proposed AFC-NSPSO will be compared with some automatic clustering algorithms for categorical data in terms of the optimal number of clusters, adjusted rank index (ARI), and cluster accuracy.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have