Comparison of performance of k-nearest neighbor algorithm using smote and k-nearest neighbor algorithm without smote in diagnosis of diabetes disease in balanced data

A G Pertiwi,R Kusumaningrum,I Waspada,A Wibowo,N Bachtiar

doi:10.1088/1742-6596/1524/1/012048

Abstract

According to the Indonesian Health Profile in 2017, diabetes is one of the causes of death for almost 70% in the world. The high mortality rate induces the need for making the effort to reduce the number of people with diabetes by conducting studies that lead to making a diagnosis so that can detect a person with diabetes accurately. This study tries to compare the performance of the K-Nearest Neighbors algorithm using Synthetic Minority Over-sampling Technique and the K-Nearest Neighbors algorithm without Synthetic Minority Over-sampling Technique in diagnosing diabetes on imbalanced datasets. The parameters tested are the k value of the K-Nearest Neighbors and Synthetic Minority Over-sampling Technique. The testing is carried out using the K-Fold Cross Validation strategy. The data used in this study were 3876 data from Pertamina Central Hospital. Based on the results of tests conducted, it shows that the value of accuracy produced in diagnosing diabetes by using Synthetic Minority Over-sampling Technique is better than the accuracy produced without using Synthetic Minority Over-sampling Technique with the highest accuracy increase of 8.25%. The highest average accuracy is obtained when the value of k = 3 in the K-Nearest Neighbors, k = 5 in the Synthetic Minority Over-sampling Technique, and fold = 10, which reaches 78.06%.

Full Text