Abstract

Diabetes is a matter of concern for the health of the entire world, its diagnosis and cure are among the prime challenge for the medical fraternity, because it can be controlled but can't be cured, sooner the diagnosis the better it will be for the patient. Thus, use of machine learning for timely classification, of diabetes plays a vital role to protect patient from the life threatening complications in future. Various classification techniques are available in Machine Learning (ML) viz. Support Vector Machines (SVM), Random Forest, Naive Bayes Classifier, Linear Regression (LR), K-Nearest Neighbor(KNN) algorithm, etc. etc. But the question is which of the classification techniques, timely and accurately identifies this sensitive disorder. While predicting Diabetes using any machine learning algorithm, the accuracy, specificity and sensitivity, are some of the important parameters. The strengthening of these parameters requires the understanding of dataset under consideration, i.e. whether the data set is having some missing values or outliers, if missing values exists then to strengthen the prediction accuracy; one has to apply the data imputation techniques on the dataset. In the performed work the well-known dataset (Pima Indian) from UCI repository, was subject to data imputation techniques to handle the missing values present in it(tabulated in Table-1). Thereafter the said Machine Learning techniques were applied, and compared on the basis of various parameters viz. Accuracy, Sensitivity, and Specificity etc., to choose the best among algorithm one has to compare the multiple criteria's altogether, which is quite challenging. Thus, in the performed work the Evaluation Based on Distance from Average Solution (EDAS) is applied, it is a technique of Multi-Criteria-Decision-Making (MCDM). By applying EDAS over the performance evaluation statistics (speed, accuracy, specificity and sensitivity) of various classification algorithms viz. Naive Bayes (NB) Classifier, Support Vector Machines (SVM),K-Nearest Neighbor (KNN), Random Forest(RF), Linear Regression (LR); it is found that the Naive Bayes (NB) is Ranked as the best Classifier and Random Forest (RF) was the second best ranked alternative for analyzing the PIMA INDIAN DATASET, to predict the diabetes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call