Abstract

K-Means is a simple clustering algorithm, this method starts with randomizing partitions and continuing to reassign samples to clusters based on similarities between data. However, the K-Means method has several disadvantages, including determining the initial cluster center value done randomly, and the distance model used in determining the similarity between data where conventional distance models have the same effect on each data attribute. In this study will try to improve the performance of K-Means by using a combination of Principal Component Analysis (PCA) and Rapid Centroid Estimation (RCE). PCA will determine the weight of each attribute data based on eigen value, and RCE is used to determine the beginning of the cluster center. To see the performance of the proposed method, this research will use 3 datasets obtained from the UCI Repository, including ionosphere, iris, and wine. Analysis of the performance of the proposed method is only measured based on MSE and SSE. The results of this study indicate that the PCA and RCE methods were able to improve the performance of K-Means, the highest performance improvement based on MSE was found in iris data, which amounted to 56.76%, while based on SSE occurred in the ionosphere data which was 86.08%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call