Abstract

Smart devices and technology applications are used in many fields. Much information is now recorded and collected rapidly so data analysis, especially clustering analysis, is vital to the process of analyzing and obtaining valuable information from datasets. However, data has different types of attributes: numerical, categorical, and mixed attributes. Some datasets also contain noise and outliers. An appropriate clustering is necessary to exploit the data structure. This study proposes a clustering algorithm that is called a possibilistic fuzzy k-modes (PFKM) algorithm. This combines the concept of possibility with the fuzzy k-modes (FKM) algorithm to address the effect of outliers and to improve the clustering results for categorical data. This study also implements three metaheuristics to increase clustering performance: a genetic algorithm (GA), a particle swarm optimization (PSO) and the sine-cosine algorithm (SCA). Three clustering algorithms are proposed: the GA-PFKM, PSO-PFKM, and SCA-PFKM algorithms. The performance of the algorithms is compared with that for the classical FKM algorithm using two indices: the sum-of-squared error (SSE) and the accuracy. The experimental results show that the PSO-PFKM and SCA-PFKM algorithms perform better for most datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call