Abstract

Diabetes is one of the deadliest and costliest diseases. Today, automatic diabetes detection systems are primarily developed using deep learning (DL) approaches, which give high accuracy in classifying patients into two classes: have diabetes or not. Unfortunately, DL is a high-complexity and unexplainable black-box model. This paper proposes a new nearest neighbor-based framework to tackle those issues in classifying two diabetes datasets: binary-class Pima India Diabetes Dataset (PIDD) and multiclass Diabetes Type. A k-means clustering (KMC) is first carried out to remove the noises or outliers and keep the competent data in the training set. The dimension of the competent data is then reduced using an autoencoder (AE) to minimize the distances of the intra-class data but maximize that of the inter-class. A k-nearest neighbor (KNN) classifier and two variants: pseudo nearest neighbor rule (PNNR) and local mean-based pseudo nearest neighbor (LMPNN), are used to detect diabetes. In addition, a new variant named multi-voter multi-commission nearest neighbor (MVMCNN) is introduced. An investigation based on 5-fold cross-validation (FCV) informs that, for binary-class PIDD, the proposed combination of KMC, AE, and MVMCNN achieves the highest accuracy of 99.13%, which is slightly higher than the state-of-the-art DL-based detection model that produces 98.07%. An evaluation based on 10-FCV also indicates that, for the multiclass Diabetes Type, it obtains a higher accuracy of 95.24% than the DL-based model for predicting diabetes that gives 94.02%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.