Abstract

This study examines a case study and impact of predicting early diabetes in United States through the application of Logistic Regression Model. After comparing the predictive ability of machine learning algorithm (Binomial Logistic Model) to diabetes, the important features that causes diabetes were also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC). From the correlation coefficient analysis, we can deduce that, out of the 16 PIE variables, only “Itching and Delayed healing” were statistically insignificant with the target variable (class) with a value of 83% and 33% respectively while “Alopecia and Gender/Sex” has a negative correlation with the target variable (class). In addition, the Lasso Regularization method was used to penalize our logistic regression model, and it was observed that the predictor variable “sudden_weight_loss” does not appear to be statistically significant in the model and the predictor variables “Polyuria and Polydipsa” contributed most to the prediction of Class "Positive" based on their parameter values and odd ratios. Since the confidence interval of our model falls between 93% and 99%, we are 95% confident that our AUC is accurate and thus, it indicates that our fitted model can predict diabetes status correctly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call