A new nearest neighbor-based framework for diabetes detection

Suyanto Suyanto,Selly Meliana,Tenia Wahyuningrum,Siti Khomsah

doi:10.1016/j.eswa.2022.116857

Abstract

Diabetes is one of the deadliest and costliest diseases. Today, automatic diabetes detection systems are primarily developed using deep learning (DL) approaches, which give high accuracy in classifying patients into two classes: have diabetes or not. Unfortunately, DL is a high-complexity and unexplainable black-box model. This paper proposes a new nearest neighbor-based framework to tackle those issues in classifying two diabetes datasets: binary-class Pima India Diabetes Dataset (PIDD) and multiclass Diabetes Type. A k-means clustering (KMC) is first carried out to remove the noises or outliers and keep the competent data in the training set. The dimension of the competent data is then reduced using an autoencoder (AE) to minimize the distances of the intra-class data but maximize that of the inter-class. A k-nearest neighbor (KNN) classifier and two variants: pseudo nearest neighbor rule (PNNR) and local mean-based pseudo nearest neighbor (LMPNN), are used to detect diabetes. In addition, a new variant named multi-voter multi-commission nearest neighbor (MVMCNN) is introduced. An investigation based on 5-fold cross-validation (FCV) informs that, for binary-class PIDD, the proposed combination of KMC, AE, and MVMCNN achieves the highest accuracy of 99.13%, which is slightly higher than the state-of-the-art DL-based detection model that produces 98.07%. An evaluation based on 10-FCV also indicates that, for the multiclass Diabetes Type, it obtains a higher accuracy of 95.24% than the DL-based model for predicting diabetes that gives 94.02%.

Full Text