Tackling Initial Centroid of K-Means with Distance Part (DP-KMeans)

Ahmad Ilham,Luqman Assaffat,Danny Ibrahim,Achmad Solichan

doi:10.1109/sain.2018.8673364

Abstract

The initial centroid is a fairly challenging problem in the k-means method because it can affect the clustering results. In addition, choosing the starting centroid of the cluster is not always appropriate, especially, when the number of groups increases. The random technique is often used to overcome this problem, but it produces a variety of solutions because the initial centroid initialization uses a random way. Therefore, we propose Distance Part’s (DP) method to solve initial cluster initialization problems on the k-means method (DP-KMeans). DP-KMeans is a new approach for initial centroid; this approach works by way of data is partitioned based on the sorted data from largest to smallest value distance to the reference point. This method is called by DP-KMeans, because the data of partition is based on the sorted data distance to the reference point. In this study, four datasets by the UCI machine learning repository are used to evaluate the proposed method. The output process shows that the proposed method produces influential results with the lowest sum of square error for k = 4 are 10.606, 705.144, 13.450, 97.767. Finally, it can be concluded that DP-KMeans can improve the k-means performance on the initial centroid problem.

Full Text