Abstract

Categorical data clustering has been attracted a lot of attention recently due to its necessary in the real-world applications. Many clustering methods have been proposed for categorical data. However, most of the existing algorithms require the predefined number of clusters which is usually unavailable in real-world problems. Only a few works focused on automatic clustering, but mainly handled for numerical data. This study develops a novel automatic fuzzy clustering using non-dominated sorting particle swarm optimization (AFC-NSPSO) algorithm for categorical data. The proposed AFC-NSPSO algorithm can automatically identify the optimal number of clusters and exploit the clustering result with the corresponding selected number of clusters. In addition, a new technique is investigated to identify the maximum number of clusters in a dataset based on the local density. To select a final solution in the first Pareto front, some internal validation indices are used. The performance of the proposed AFC-NSPSO on the real-world datasets collected from the UCI machine learning repository exhibits effectiveness compared with some other existing automatic categorical clustering algorithms. Besides, this study also applies the proposed algorithm to analyze a real-world case study with an unknown number of clusters.

Highlights

  • Clustering is a popular technique which partitions a dataset into multiple distinct clusters based on the similarity or dissimilarity measure to exploit the structure of dataset

  • Few studies handled the automatic clustering for categorical data such as: automatic top-down clustering (AT-DC) [22], Best-K Plot method (BKPlot) [23], categorical data clustering with automatic selection of cluster number [24], divisive hierarchical clustering of categorical data [25], projected clustering for categorical data (PROCAD) [26], and multi-objective clustering based on sequential games (MOCSG) [27]

  • Motivated by the aforementioned issues, this study focuses on developing a novel automatic fuzzy clustering algorithm based non-dominated sorting particle swarm optimization for categorical data

Read more

Summary

INTRODUCTION

Clustering is a popular technique which partitions a dataset into multiple distinct clusters based on the similarity or dissimilarity measure to exploit the structure of dataset. Few studies handled the automatic clustering for categorical data such as: automatic top-down clustering (AT-DC) [22], Best-K Plot method (BKPlot) [23], categorical data clustering with automatic selection of cluster number [24], divisive hierarchical clustering of categorical data [25], projected clustering for categorical data (PROCAD) [26], and multi-objective clustering based on sequential games (MOCSG) [27] These studies will be reviewed elaborately in the following section. Motivated by the aforementioned issues, this study focuses on developing a novel automatic fuzzy clustering algorithm based non-dominated sorting particle swarm optimization (abbreviated as AFC-NSPSO) for categorical data. The clustering performance of the proposed AFC-NSPSO will be compared with some automatic clustering algorithms for categorical data in terms of the optimal number of clusters, adjusted rank index (ARI), and cluster accuracy.

REVIEW OF AUTOMATIC CLUSTERING FOR CATEGORICAL DATA
IDENTIFY THE MAXIMUM NUMBER
1: Calculate density for each object xi
FITNESS FUNCTION
NSPSO PROCEDURE
EXPERIMENTAL EVALUATION WITH OTHER BENCHMARK ALGORITHMS
TIME COMPLEXITY ANALYSIS The proposed PM-FGCA consists of two parts
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call