Abstract
The accuracy of automated diabetes prediction models using the past health record of the patient is highly dependent on the correctness of the used data. If patient data is inconsistent and contains lots of missing values, then the prediction is more challenging. In this paper, the impact of missing value imputation (MVI) techniques is evaluated in diabetes prediction with existing missing values. The experiments are performed on the Pima Indians diabetes dataset, which contains many missing values. In this paper, first, MVI techniques are used for handling the missing values. Second, K-Means clustering is used to analyze the best imputation technique based on the percentage of incorrectly classified instances in each imputed dataset. Third, principal component analysis (PCA) is used for feature extraction, and Info Gain is used for selecting the optimal set of features. Six different classification models, such as multi-layer perceptron (MLP), support vector machine (SVM), Naive Bayes (NB), decision tree (J48), AdaBoost, and Bagging are used for experiments. Eight different techniques such as CMC, Case Deletion, KMI, SVMI, WKNNI, KNNI, FKMI, and MC are used for missing value imputation. The experimental result shows that case deletion and KMI imputed datasets have the lowest number of incorrectly classified instances. On these two datasets, when to six classifiers are applied, we obtained that MLP classifier attained the highest accuracy of 98.9967 % with the case deletion imputed dataset and accuracy of 99.2767% with the KMI imputed dataset when six principal components are used. The other classifiers used in comparison obtained accuracies ranging between 93% - 98%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.