Handling Missing Data Using Combination of Deletion Technique, Mean, Mode and Artificial Neural Network Imputation for Heart Disease Dataset

Anita Desiani,Novi Rustiana Dewi,Muhammad Nawawi,Naufal Rachmatullah,Annisa Nur Fauza,Muhammad Arhami

doi:10.26554/sti.2021.6.4.303-312

Anita Desiani, Novi Rustiana Dewi + Show 4 more

Open Access

https://doi.org/10.26554/sti.2021.6.4.303-312

Copy DOI

Abstract

The University of California Irvine Heart disease dataset had missing data on several attributes. The missing data can loss the important information of the attributes, but it cannot be deleted immediately on dataset. To handle missing data, there are several ways including deletion, imputation by mean, mode, or with prediction methods. In this study, the missing data were handled by deletion technique if the attribute had more than 70% missing data. Otherwise, it were handled by mean and mode method to impute missing data that had missing data less or equal 1%. The artificial neural network was used to handle the attribute that had missing data more than 1%. The results of the techniques and methods used to handle missing data were measured based on the performance results of the classification method on data that has been handled the problem of missing data. In this study the classification method used is Artificial Neural Network, Naïve Bayes, Support Vector Machine, and K-Nearest Neighbor. The performance results of classification methods without handling missing data were compared with the performance results of classification methods after imputation missing data on dataset for accuracy, sensitivity, specificity and ROC. In addition, the comparison of the Mean Squared Error results was also used to see how close the predicted label in the classification was to the original label. The lowest Mean Squared Error wasobtained by Artificial Neural Network, which means that the Artificial Neural Network worked very well on dataset that has been handled missing data compared to other methods. The result of accuracy, specificity, sensitivity in each classification method showed that imputation missing data could increase the performance of classification, especially for the Artificial Neural Network method.

Full Text