Abstract

Type II Diabetes Mellitus (T2DM) has become an increasingly prevalent disease due to the rising number of the obese population. Even though this disorder is the leading cause of more severe health complications, studies have shown that T2DM is largely preventable if detected in an early phase. To improve current diagnosis and prognosis procedures, this study aims to examine three machine learning algorithmsK-Nearest Neighbor (KNN), Random Forest (RF), and Support Vector Classification (SVC)and their potential in making accurate predictions on the outcomes of the Pima Indians Diabetes Dataset (PIDD). After training and 5-fold cross validation, the results show that the RF algorithm has the highest accuracy at 75.25%, followed by SVC at 74.91% and KNN at 71.01%. In addition, feature importances were evaluated for all three models, yet we observed a drastic difference in the top-ranked features across different models, which implies that more training and larger datasets are necessary before realizing these computational approaches into practice. Nevertheless, the potential of these approaches highlighted in this study demonstrated that machine learning is a burgeoning strategy in clinical use and in solving real-world problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call