Abstract

Diabetes is one of the chronic diseases in the world, 246 million people are inflicted by this disease and according to a World Health Organisation (WHO) report, this figure will increase to 380 million sufferers by 2025. Many other debilitating and critical health issues may further develop if this disease is not diagnosed or remain unidentified. Machine Learning (ML) techniques are now being used in various fields like education, healthcare, business, recommendation system, etc. Healthcare data is complex and high in dimensionality and contains irrelevant information - due to this, the prediction accuracy is low. The Pima Indians Diabetes Dataset was used in this research, it consisted of 768 records. Firstly, the missing values are replaced by the median followed by Linear Discriminant Analysis. Using the Python programming language, feature selection techniques is applied in combination with five classification algorithms: Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Logistic Regression, Random Forest and Decision Tree. The aim of this paper is to compare the different classification algorithms in order to predict diabetes in patients more accurately. K-fold cross-validation is applied, considering k to be 2, 4, 5 and 10. The performance parameters taken are the: accuracy, precision, recall, F Score and area under the curve. Our study found that the MLP classifier gave the highest accuracy of 78.7% with a recall of 61.26%, precision of 72.45% and F1 Score of 65.97% for k = 4.

Highlights

  • Diabetes is one of the most chronic diseases in the world in which the sugar level of blood becomes too high [1]

  • The outliers are detected by using the Inter Quartile Range (IQR) (InterQuartile Range) and the outliers that were found are replaced with the median value

  • The linear discriminant analysis (LDA) feature selection technique was applied in order to extract the important features from the pre-processed dataset

Read more

Summary

Introduction

Diabetes is one of the most chronic diseases in the world in which the sugar level of blood becomes too high [1]. It has become a fifth-ranked disease for disease related deaths [2]. Other problems may arise like the increased risk of heart attack and stroke [3]. These diseases cannot be cured, the only way is to manage the glucose level in the blood.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call