Early Stage Diabetes Prediction Using Machine Learning Methods

Özge Nur ERGÜN,Hamza O.İLHAN

doi:10.31590/ejosat.1015816

Abstract

Diabetes is a common disease that is incurable and fatal. Millions of people worldwide have diabetes and it directly affects people’s lives. Early diagnosis helps reduce the effects of diabetes and improve the life quality of patients, but in common case people live with diabetes for years before getting diagnosed. Early diagnosis can be done by applying machine learning methods on existing data of patients. In this way, people can quickly get diagnosed without taking a glucose screening test or any blood test. Answering a simple question set would be enough to determine if a person is diabetic or has a risk of being diabetic. In the proposed study, determination of diabetes is performed by machine learning techniques. In this scope, a publicly available diabetes dataset, which includes 16 features that are collected from 520 people, was used to create predictive models. Eight machine learning methods were individually performed over the dataset. The results of each model were validated by using a 10 fold cross validation schema. Addition to accuracy metric, confusion matrix based other performance metrics; precision, recall and f1 score, were also reported. All of the created models resulted in high accuracy scores. The minimum accuracy score was measured as 88.85% by using one of the basic machine learning techniques, Naive Bayes. The highest accuracy rate was 99.04%, which is obtained by using a one dimensional convolutional neural network model. The designed Convolutional Neural Network model also resulted in highest performance scores for other metrics as 100.00%, 98.63% and 99.31% for precision, recall and f1 scores, respectively. These findings indicate that the created 1D CNN model can be utilized in the determination of diabetic patients by asking only several questions to patients.

Full Text