Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering

Alyauma Hajjah,Dadang Priyanto,Deny Jollyta,Prihandoko Prihandoko,Yulvia Nora Marlim

doi:10.30812/matrik.v23i1.3078

Abstract

Heuristic data requires appropriate clustering methods to avoid casting doubt on the information generated by the grouping process. Determining an optimal cluster choice from the results of grouping is still challenging. This study aimed to analyze the four numerical measurement formulas in light of the data patterns from categorical that are now accessible to give users of heuristic data recommendations for how to derive knowledge or information from the best clusters. The method used was clustering with four measurements: Euclidean, Canberra, Manhattan, and Dynamic Time Warping and Elbow approach for optimizing. The Elbow with Sum Square Error (SSE) is employed to calculate the optimal cluster. The number of test clusters ranges from k = 2 to k = 10. Student data from social media was used in testing to help students achieve higher GPAs. 300 completed questionnaires that were circulated and used to collect the data. The result of this study showed that the Manhattan Distance is the best numerical measurement with the largest SSE of 45.359 and optimal clustering at k = 5. The optimal cluster Manhattan generated was made up of students with GPAs above 3.00 and websites/ vlogs used as learning tools by the mathematics and computer department. Each cluster’s ability to create information can be impacted by the proximity of qualities caused by variations in the number of clusters.

Full Text