Abstract

The K-Means Clustering algorithm is commonly used by researchers in grouping data. The main problem in this study was that it has yet to be discovered how optimal the grouping with variations in distance calculations is in K-Means Clustering. The purpose of this research was to compare distance calculation methods with K-Means such as Euclidean Distance, Canberra Distance, Chebychev Distance, Cosine Similarity, Dynamic TimeWarping Distance, Jaccard Similarity, and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. The best distancecalculation was determined from the smallest Davies Bouldin Index value. This research aimed to find optimal clusters using the K-Means Clustering algorithm with seven distance calculations based on types of numerical measures. This research method compared distance calculation methods in the K-Means algorithm, such as Euclidean Distance, Canberra Distance, Chebychev Distance, Cosine Smilirity, Dynamic Time Warping Distance, Jaccard Smilirity and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. Determining the best distance calculation can be seen from the smallest Davies Bouldin Index value. The data used in this study was on cosmetic sales at Devi Cosmetics, consisting of cosmetics sales from January to April 2022 with 56 product items. The result of this study was a comparison of numerical measures in the K-Means Clustering algorithm. The optimal cluster was calculating the Euclidean distance with a total of 9 clusters with a DBI value of 0.224. In comparison, the best average DBI value was the calculation of the Euclidean Distance with an average DBI value of 0.265.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call