Abstract: Diabetes is one of the chronic diseases, which is increasing from year to year. Developing an automated system that can detect diabetes patients plays an important role in medical science. A stack classifier model is designed for the detection of diabetes by combining three base estimators such as Random Forest Classifier, LightGBM Classifier, and K-Nearest Neighbors Classifier, and Logistic Regression as meta-classifiers. The data preprocessing includes transforming categorical variables into numerical format. Each base learner is trained on the preprocessed data and predictions are made. In the meta learner stage, Logistic Regression is trained to make predictions based on the predictions of the base learners. The goal of the meta learner is to learn how to combine the predictions of the base learners to make a more accurate final prediction on whether the patient is diabetic or not. The use of a stacking classifier improves prediction accuracy compared to using a single classifier. The developed model gives an accuracy of 98%. The 5-fold cross validation is used to get a more robust estimation of generalization error. Thus, the developed model offers a means to enhance early detection and elevate the quality of care for diabetic patients.
Read full abstract