Comparing Machine Learning Algorithms to Predict Diabetes in Women and Visualize Factors Affecting It the Most—A Step Toward Better Health Care for Women

Ankur Saxena,Arushi Agarwal

doi:10.1007/978-981-15-1286-5_29

Abstract

Diabetes affects millions of people throughout the world, and more than half of the people suffering from it are women. Creating a better diagnosis and study tool will enable us to take a step forward in better healthcare. We use sklearn to create a model for the Pima Indians’ Diabetes Dataset. The main goal is to compare the different algorithms to obtain the best accuracy. Prediction of diabetes in women is crucial as it not only ensures an early start of treatment, but also helps in prevention in cases of high probability of the disease occurring. We have not only focused on the detection part, but also tried to study and visualize the factors that were most correlated to a diabetic person. By studying the most common algorithms, we can figure out which area needs to be worked upon to develop better ways of healthcare. Machine learning has been actively used in health care and by implementing this in conditions like diabetes which affects a major population in the world, including almost 100 million Americans and more than 62 million Indians. The idea behind choosing the dataset was to get parameters and features, which are not determined by geography or region, but the overall physiology of women, so that most women throughout the world can be benefitted. The algorithms compared are decision trees, logistic regression, Naive Bayes, SVM, and KNN. The final result got us an accuracy of 81.1% with the help of K-Fold and Cross-Validation.

Full Text