Abstract

Cluster analysis is a data mining technique that has been widely used to exploit useful information in a great amount of data. Because of their evaluation mechanism based on an intracluster distance (ICD) function, traditional single-objective clustering algorithms are not appropriate for not-well-separated data. Specifically, they may easily result in the drop of the optimal solution accuracy on their late stages of search when dealing with the latter. To overcome the problem, in this paper a novel index reflecting the similarity of data within a cluster is presented and called intracluster cohesion (ICC). However, if a multiobjective method is used to cluster with ICD and ICC as the specified objectives, its clustering accuracy may depend on one’s experience. Motivated by these, we propose an accelerated two-stage particle swarm optimization (ATPSO) in which ${K}$ -means is utilized to accelerate particles’ convergence during the population initialization. Its clustering process consists of two stages. First, the main objective of minimizing ICD is to execute preliminary clustering; second, ICC is optimized to promote the clustering accuracy. Extensive experiments with the help of 17 open-source clustering sets in various geometric distributions are conducted. The results show that ATPSO outperforms PSO, ${K}$ -means PSO (KPSO), chaotic PSO (CPSO), and accelerated CPSO in terms of accuracy, and its efficiency is approximate to that of KPSO. Its convergence trend indicates that the adoption of the proposed ICC contributes to the clustering accuracy. Remarkably, compared with the Pareto-based multiobjective PSO, ATPSO can detect clusters more accurately and quickly through the proposed two-stage search.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call