Improvement of K-Means Performance Using a Combination of Principal Component Analysis and Rapid Centroid Estimation

Sapriadi Sapriadi,Sutarman Sutarman,E B Nababan

doi:10.1088/1742-6596/1230/1/012003

Sapriadi Sapriadi, Sutarman Sutarman + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1230/1/012003

Copy DOI

Abstract

K-Means is a simple clustering algorithm, this method starts with randomizing partitions and continuing to reassign samples to clusters based on similarities between data. However, the K-Means method has several disadvantages, including determining the initial cluster center value done randomly, and the distance model used in determining the similarity between data where conventional distance models have the same effect on each data attribute. In this study will try to improve the performance of K-Means by using a combination of Principal Component Analysis (PCA) and Rapid Centroid Estimation (RCE). PCA will determine the weight of each attribute data based on eigen value, and RCE is used to determine the beginning of the cluster center. To see the performance of the proposed method, this research will use 3 datasets obtained from the UCI Repository, including ionosphere, iris, and wine. Analysis of the performance of the proposed method is only measured based on MSE and SSE. The results of this study indicate that the PCA and RCE methods were able to improve the performance of K-Means, the highest performance improvement based on MSE was found in iris data, which amounted to 56.76%, while based on SSE occurred in the ionosphere data which was 86.08%.

Full Text