Abstract

(1) Background: Diabetes is a common chronic disease and a leading cause of death. Early diagnosis gives patients with diabetes the opportunity to improve their dietary habits and lifestyle and manage the disease successfully. Several studies have explored the use of machine learning (ML) techniques to predict and diagnose this disease. In this study, we conducted experiments to predict diabetes in Pima Indian females with particular ML classifiers. (2) Method: A Pima Indian diabetes dataset (PIDD) with 768 female patients was considered for this study. Different data mining operations were performed to a conduct comparative analysis of four different ML classifiers: Naïve Bayes (NB), J48, Logistic Regression (LR), and Random Forest (RF). These models were analyzed by different cross-validation (K = 5, 10, 15, and 20) values, and the performance measurements of accuracy, precision, F-score, recall, and AUC were calculated for each model. (3) Results: LR was found to have the highest accuracy (0.77) for all ‘k’ values. When k = 5, the accuracy of J48, NB, and RF was found to be 0.71, 0.76, and 0.75. For k = 10, the accuracy of J48, NB, and RF was found to be 0.73, 0.76, 0.74, while for k = 15, 20, the accuracy of NB was found to be 0.76. The accuracy of J48 and RF was found to be 0.76 when k = 15, and 0.75 when k = 20. Other parameters, such as precision, f-score, recall, and AUC, were also considered in evaluations to rank the algorithms. (4) Conclusion: The present study on PIDD sought to identify an optimized ML model, using with cross-validation methods. The AUC of LR was 0.83, RF 0.82, and NB 0.81). These three were ranked as the best models for predicting whether a patient is diabetic or not.

Highlights

  • Diabetes is a common chronic disease occurring when the pancreas does not produce enough insulin (Type 1 diabetes) or when the patient’s body does not effectively utilize the insulin (Type 2 diabetes)

  • It is obvious that plasma glucose concentration has the highest information gain, which could be considered as the highest risk factor for diabetes

  • We developed intelligence can be exploited to improve our understanding of the factors causing the onset of this four binary classifier models: Naïve Bayes (NB), J48, Logistic Regression (LR), and Random Forest (RF), and each model was analyzed using different CV

Read more

Summary

Introduction

Diabetes is a common chronic disease occurring when the pancreas does not produce enough insulin (Type 1 diabetes) or when the patient’s body does not effectively utilize the insulin (Type 2 diabetes). Hyperglycemia or raised blood sugar is the common consequence of uncontrolled diabetes. Diabetes can cause severe damage to nerves and blood vessels [1]. Advanced diabetes is complicated by coronary illness, visual impairment, and kidney failure [1,2]. Detection of the disease can give patients the opportunity to make the necessary lifestyle changes and can improve their life expectancy [3].

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.