Development of Early Stage Diabetes Prediction Model Based on Stacking Approach

Ilkay Cinar,Murat Koklu,Yavuz Selim Taspinar

doi:10.31803/tg-20211119133806

Abstract

Diabetes is a disease that may pose direct or indirect risks in terms of human health. Early diagnosis can minimize the potential harm of this disease to the body and reduce the probability of death. For this reason, laboratory tests are performed on diabetic patients. The analysis of these tests enables the diagnosis of diabetes. The aim of this study is so quickly diagnose diabetes by using data obtained from patients with machine learning methods. In order to diagnose the disease, k-nearest neighbor (k-NN), logistic regression (LR), random forest (RF) models and the stacking meta model which is created by combining these three models were used. The dataset used in the research includes test samples taken from 520 people. The dataset has 17 features, including 16 input features and 1 output feature. As a result of the classification through this dataset, different classification results were obtained from the models. The classification success of the models LR, k-NN, RF and stacking were found to be 91.3%, 91.7%, 97.9% and 99.6%, respectively. F-score, precision and recall performance metrics were utilized for a detailed analysis of the models' classification results. The obtained results revealed that the stacking model has a sufficient level to be used as a decision support system in the early diagnosis of diabetes.

Full Text