Early Prediction of Diabetes Mellitus Using Machine Learning

Gaurav Tripathi,Rakesh Kumar

doi:10.1109/icrito48877.2020.9197832

Abstract

Diabetes mellitus is one of the noxious disease which causes abnormalities of blood glucose due to the resistance of producing insulin hormone in the body. It affects various organs in the body such as the kidney, nerves, and eyes if it is not an early diagnosis. With the advancement in technological growth, people attract to personalized healthcare. Machine learning is a very growing field in the predictive analysis and often used in healthcare applications where the prediction of diseases and their symptoms is identified in an early stage. The main objective of this work is to build a model for early prediction of diabetes by using machine learning classification algorithms under consideration of significant features related to diabetes. The proposed model gives the closest results comparing to clinical outcomes and also helps in the personalized diagnosis of patients. There are four machine learning algorithms these are Linear Discriminant Analysis (LDA), K-nearest neighbor (KNN), Support Vector Machine (SVM), and Random Forest (RF) are used in the predictive analysis of early-stage diabetes. Pima Indian Diabetes Database (PIDD) is used for experimental analysis which is taken from the UCI machine learning repository from the University of California, Irvine. The performance measures of these classification algorithms are done on various statistical measures such as sensitivity (recall), precision, specificity, F-score, and accuracy. Accuracy is the measurements of classifying correctly and incorrectly instances. The experimental results show that Random Forest (RF) gives the maximum accuracy of 87.66 % and outperformed in other classification algorithms.

Full Text