Clustering Algorithm Combining CPSO with K-Means

Chunqin Gu,Qian Tao

doi:10.2991/ameii-15.2015.140

Abstract

A clustering algorithm combining particle swarm optimization (CPSO) with K-Means (KM-CPSO) is proposed, which features better search efficiency than K-Means, PSO and CPSO. The K-Means algorithms cannot guarantee convergence to global optima and suffer in local optimal cluster center because they are sensitive to initial cluster centers. Chaotic particle swarm optimization (CPSO) can find global optimal solution; meanwhile K-Means can achieve local optima. The CPSO-KM algorithm utilizes both global search capability of CPSO and local search capability of K-Means. CPSO-KM algorithm has been tested with two synthetic datasets and three classical data sets from UCI. Experimental results show better performance of the CPSO-KM as compared to K-Means, PSO and CPSO. Introduction Clustering is a common unsupervised learning method, which partitions a group of objects (instances) into groups (clusters) such that objects in the same cluster are similar to each other and dissimilar to the objects in other clusters. K-Means [1] algorithm partitions the groups of given objects into k clusters based on a distance metric. The K-Means algorithm is easy to implement and very efficient. The main drawback of the K-Means algorithm is that the clustering result is sensitive to the initial clusters centers and may converge to the local optima [2]. In recent years, swarm intelligent algorithms has been combined with K-Means and applied on many clustering problems, because of the ability of global search of the swarm intelligent algorithms, such as GA[3], PSO[4], ACO[5]. The combination of swarm intelligent algorithms and K-Means algorithm can take advantage of both global search ability of swarm intelligent algorithms and local search ability of K-Means algorithm. CPSO[6] has the advantage of more precise global search ability and more fast convergence speed in our previous literature. The CPSO is used to obtain better clusters centers for initial clusters centers, then K-Means algorithm is used based on the initial clusters centers found by CPSO. The rest of the paper is organized as follows. Section 2 describes the model of clustering problems. Section 3 presents our clustering algorithm combining CPSO with K-Means (CPSO-KM). Section 4 illustrates experimental results. Finally, Section 5 makes conclusion. Model of clustering problem The clustering problem is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Given a data objects set 1 2 {X ,X ,...,X } N DS  , where 1 2 ( , ,..., ) j L j j j X x x x  , L is the dimension of a data object, a clustering problem tries to find a K-partition of DS, 1 2 { , ,..., } K C C C C  , such that the similarity of the data objects in the same cluster is maximum and the difference of the data objects between different cluster centroid is maximum. The objective function of clustering problem is evaluated based on the sum of squared error (SSE), which is defined as International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) © 2015. The authors Published by Atlantis Press 749

Full Text