Abstract

In the past decades, there is a wide increase in the number of people affected by diabetes, a chronic illness. Early prediction of diabetes is still a challenging problem as it requires clear and sound datasets for a precise prediction. In this era of ubiquitous information technology, big data helps to collect a large amount of information regarding healthcare systems. Due to explosion in the generation of digital data, selecting appropriate data for analysis still remains a complex task. Moreover, missing values and insignificantly labeled data restrict the prediction accuracy. In this context, with the aim of improving the quality of the dataset, missing values are effectively handled by three major phases such as (1) pre-processing, (2) feature extraction, and (3) classification. Pre-processing involves outlier rejection and filling missing values. Feature extraction is done by a principal component analysis (PCA) and finally, the precise prediction of diabetes is accomplished by implementing an effective distance adaptive-KNN (DA-KNN) classifier. The experiments were conducted using Pima Indian Diabetes (PID) dataset and the performance of the proposed model was compared with the state-of-the-art models. The analysis after implementation shows that the proposed model outperforms the conventional models such as NB, SVM, KNN, and RF in terms of accuracy and ROC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call