Diabetes is a chronic disease rarely detected and develops quickly. Diabetes can trigger other chronic diseases such as kidney failure and heart disease. Early detection is necessary to help patients treat diabetes before the disease becomes more severe. Various health examination methods to detect diabetes, but these examinations require medical expert action and cannot be carried out by anyone. In addition, examination costs are often unaffordable. This research aims to apply data mining methods, especially k-Nearest Neighbor (KNN), for early detection of diabetes patients based on disease symptoms and patient clinical data. KNN is used to classify patient symptoms and clinical data into two classes, diabetes and non-diabetes, calculating the distance between test data and training data using Euclidean Distance. The research results show that a lower k-value provides a higher accuracy value. However, accuracy at low k-values is insufficient to conclude the performance of KNN for early diabetes detection. High accuracy at low k-values has the potential for overfitting, and the model is not generalizing well. Apart from that, if you use a low k-value, the model only sees patterns from 1 or a few neighbors, which results in the pattern of the data not being captured by the KNN model using a k-value that is too high also risks the model becoming underfitting. The model is too general, which makes the model unreliable. This research made use of the k-fold cross-validation technique to circumvent these issues. It is possible to avoid overfitting in the constructed KNN model by employing this method. The researchers are employing k-fold=10 and k-fold=20 in their investigation. KNN This research carried out this analysis by looking at the accuracy of each iteration of the k and k-fold values. The higher the k-fold value, the more accuracy the KNN produces. Inversely proportional to the k-fold cross-validation value, the higher the k-value in KNN, the decreases the accuracy. The KNN method applied in this research provides an accuracy of 98.2692% with higher precision than recall. These findings suggest that KNN can be an effective and efficient tool for early diabetes detection.
Read full abstract