Abstract

Diabetes is a chronic disease that occurs when the pancreas no longer produces insulin or when the body cannot effectively use the insulin it produces. The aim of this study is to analyze and compare the classification performance on diabetes patient dataset using four distance metric algorithms in the K-Nearest Neighbor (K-NN) method. Based on previous research, the performance values obtained were not sufficiently high, not exceeding 80%. Therefore, some actions are needed with the hope of obtaining new performance values and making comparisons with previous studies. Based on the test results using the confusion matrix, the accuracy level using Euclidean distance measurement obtained the best performance value at k=17 with 10-k fold, with an accuracy of 85.71%, precision of 86.24%, recall of 85.71%, and F-measure of 85.12%. The Manhattan distance measurement obtained the best performance value at k=25 with 10-k fold, with an accuracy of 85.53%, precision of 85.54%, recall of 85.53%, and F-measure of 85.10%. The Minkowski distance measurement obtained the best performance value at k=17 with 10-k fold, with an accuracy of 85.71%, precision of 86.24%, recall of 85.71%, and F-measure of 85.12%. On the other hand, the Hamming distance measurement obtained the best performance value at k=23 with 10-k fold, with an accuracy of 75.32%, precision of 79.27%, recall of 75.32%, and F-measure of 71.45%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call