Abstract

Diabetes, a metabolic disease in which the blood glucose level rises over time, is one of the most common chronic diseases at present. It is critical to accurately predict and classify diabetes to reduce the severity of the disease and treat it early. One of the difficulties that researchers face is that diabetes datasets are limited and contain outliers and missing data. Additionally, there is a trade-off between classification accuracy and operational law for detecting diabetes. In this paper, an algorithm for diabetes classification is proposed for pregnant women using the Pima Indians Diabetes Dataset (PIDD). First, a preprocessing step in the proposed algorithm includes outlier rejection, imputing missing values, the standardization process, and feature selection of the attributes, which enhance the dataset’s quality. Second, the classifier uses the fuzzy KNN method and modifies the membership function based on the uncertainty theory. Third, a grid search method is applied to achieve the best values for tuning the fuzzy KNN method based on uncertainty membership, as there are hyperparameters that affect the performance of the proposed classifier. In turn, the proposed tuned fuzzy KNN based on uncertainty classifiers (TFKNN) deals with the belief degree, handles membership functions and operation law, and avoids making the wrong categorization. The proposed algorithm performs better than other classifiers that have been trained and evaluated, including KNN, fuzzy KNN, naïve Bayes (NB), and decision tree (DT). The results of different classifiers in an ensemble could significantly improve classification precision. The TFKNN has time complexity O(kn2d), and space complexity O(n2d). The TFKNN model has high performance and outperformed the others in all tests in terms of accuracy, specificity, precision, and average AUC, with values of 90.63, 85.00, 93.18, and 94.13, respectively. Additionally, results of empirical analysis of TFKNN compared to fuzzy KNN, KNN, NB, and DT demonstrate the global superiority of TFKNN in precision, accuracy, and specificity.

Highlights

  • The term “diabetes” refers to a group of metabolic diseases most notably related to glucose metabolism

  • During the phases of the proposed algorithm, different phases are conducted to enrich the efficiency of the proposed algorithm, and various ML methods, such as K Nearest Neighbors (KNN), naïve Bayes (NB), decision tree (DT), and fuzzy KNN, that have been trained and evaluated

  • A model for medical diagnosis should care about accuracy and how certain the prediction is

Read more

Summary

Introduction

The term “diabetes” refers to a group of metabolic diseases most notably related to glucose metabolism. Carbohydrates obtained by the body from bread, potatoes, rice, cakes, and a variety of other meals are progressively broken down and destroyed [1] This disintegration and breakdown process begins in the stomach and continues through the duodenum and other segments of the small intestine. The specific origin of diabetes mellitus is uncertain, scientists believe that both environmental and genetic factors contribute to the condition [3] It is incurable, it can be managed with medications and medicines [4]. Individuals with DM are at risk for the development of further health problems. Diagnosis and treatment of DM will help prevent complications and reduce the risk of serious health problems [5]. There have been numerous advances in recent technologies, such as machine learning, fuzzy methods [6,7], and deep learning; Figure 1 illustrates these technologies, adopted from [8]

Objectives
Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call