Abstract

Historical and real-time healthcare data sets are valuable sources of information for predictive data analytics. However, most of the historical healthcare data sets are overloaded with challenges. One of the most frequently faced challenge is the problem of missing values, occurring because of the inaccuracies in data transmission or data entry processes. An appropriate technique for handling missing values is required to generate good quality data sets for achieving better prediction results. Removing the records with missing values, known as marginalization, poses an easy way out to this challenge. But, this will lessen the data volume of the historical data set and disturb the class balance of the data set. An alternative to marginalization is replacing missing values with plausible values, known as imputation. This paper proposes a missing value imputation technique, CLUSTIMP, using an unsupervised neural network Adaptive Resonance Theory 2 (ART2). The efficiency of the proposed imputation method is evaluated on the incomplete Mammographic mass data set and Hepatocellular Carcinoma data set (HCC) from the UCI repository considering Root Mean Squared Error (RMSE) rate and classification accuracy as the evaluation metrics. The proposed CLUSTIMP imputation algorithm outperforms existing state-of-the-art imputation methods by reducing classifiers error rates between 2 and 11%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call