The K-nearest neighbor (KNN) classifier employs distance metrics to measure the distance between the test instance and the samples used in training. With smaller samples, the KNN classifier achieves higher accuracy with low computational time. However, computing the distance between the test instance and all training samples to determine the class of the test instance requires higher computational time for a high-dimensional dataset. This research employs sequential feature selection (SFS) to select the optimal feature for diabetes prediction while reducing the computational time complexity of the KNN classifier. The KNN classifier showed effectiveness with an accuracy rate of 84.41% with nine features. The performance of the KNN improves by 2.6% when trained on the optimal features selected with the SFS. The result revealed glucose level, blood pressure (BP), skin thickness (ST), diabetes pedigree function (DPF), age, and body mass index (BMI) as the most representative features in diabetes prediction. The KNN classifier gives higher accuracy with these features. However, insulin and the number of times a woman is pregnant do not show a significant effect on the KNN classifier.
Read full abstract