Prediction of Diabetes Mellitus using Machine Learning Algorithms: Comparative Analysis of K-Nearest Neighbor, Random Forest and Logistic Regression

A.M Adeshina

doi:10.56471/slujst.v6i.319

Abstract

Diabetes Mellitus is a chronic and one of the deadliest diseases. Diabetes disease increases the risk of long-term complications, including heart diseases and kidney failures, among others. Undoubtedly, Diabetes Mellitus patients may live longer and lead healthier lives if the disease is detected early. Over the years, several efforts have been on more accurate and early detection procedures to safe patients of Diabetes Mellitus. Interestingly, with the applications of Information Technology to the disease diagnoses and therapy managements, more attention has been on using machine learning in the predictions and early detection of Diabetes Mellitus. Unfortunately, determining the most appropriate machine learning algorithm with the best performance in terms of optimum accuracy still remains a challenge. The study proposes a framework for Diabetes Mellitus detection using Machine Learning Algorithms. The proposed framework was evaluated using K-nearest neighbor (KNN), Random Forest (RF), and Logistic Regression (LR). Extensive experiments were conducted to analyze the performance of the framework focusing on four distinct different clinical datasets. To ensure robust, web compatible framework, Python and its popular data science related packages, Pandas, Numpy, Seaborn, Matplotlib and Pickle were used for the implementation. Significantly, using the standard datasets obtained from the National Institute of Diabetes and Kidney Disease, Random Forest was able to predict Diabetes Mellitus in the datasets with the best accuracy of 93.4 %.

Full Text