Abstract

Non-Insulin Dependent Diabetes Mellitus or Type2 Diabetes is one of the critical diseases and many people are suffering from it. Every year, approximately 2 to 5 million people are losing their lives as Diabetics. If Diabetes is predicted earlier, it can be controlled and also, deadly risks such as diabetes cardiac stroke, nephropathy and other disorders associated with it can be prevented. Therefore, early prediction of diabetes helps in maintaining good health. With the recent development in machine learning (ML), it is being applied to various aspects of the medical health. The Pima Indian Diabetes data set (PID), which was used in this paper, was acquired from the UCI repository. In this study, after undergoing a thorough data pre-processing and Feature engineering with feature importance models like Random Forest Importance and RFE, we used many Machine Learning models such as KNN, Logistic Regression, SVM, Random Forest, LightGBM and XGBoost for train-test splits like 60–40, 70–30 and 80–20 to predict Type-II diabetes mellitus. Among all models, the highest accuracy is obtained as 91.47% from lightGBM model for 80–20 train test split.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call