Abstract

The initial centroid is a fairly challenging problem in the k-means method because it can affect the clustering results. In addition, choosing the starting centroid of the cluster is not always appropriate, especially, when the number of groups increases. The random technique is often used to overcome this problem, but it produces a variety of solutions because the initial centroid initialization uses a random way. Therefore, we propose Distance Part’s (DP) method to solve initial cluster initialization problems on the k-means method (DP-KMeans). DP-KMeans is a new approach for initial centroid; this approach works by way of data is partitioned based on the sorted data from largest to smallest value distance to the reference point. This method is called by DP-KMeans, because the data of partition is based on the sorted data distance to the reference point. In this study, four datasets by the UCI machine learning repository are used to evaluate the proposed method. The output process shows that the proposed method produces influential results with the lowest sum of square error for k = 4 are 10.606, 705.144, 13.450, 97.767. Finally, it can be concluded that DP-KMeans can improve the k-means performance on the initial centroid problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.