A Normalized Mean Algorithm for Imputation of Missing Data Values in Medical Databases

G Madhu,K Sai Vardhan,B Lalith Bharadwaj,G Naga Chandrika

doi:10.1007/978-981-15-3172-9_72

Abstract

Many medical research databases commonly consist of the missing value problems, and the presence of missing data value has a negative impact on machine learning models. However, the data with missing value can decrease the classifier performance and can lead to wrong insights by introducing biases. Imputation approaches are typically employed to impute the missing data value for data analysis. In addition, imputation helps us to build an effective classification model to discover hidden patterns which can provide insightful outcomes. In this paper, the normalized mean imputation approach is designed to fill the missing data value in numerical datasets. After normalizing the data, compute the mean and cube-root-of-cubic mean. Finally, impute the missing data value from the maximum value of these two methods which are the plausible data value in a given dataset. In addition, it is observed that after imputation some of the outliers are also eliminated in a dataset in this approach. The experiments are conducted on benchmark datasets and compared with mean imputation, median imputation, and mode imputation approaches. The experimental results show that the suggested imputation technique performed superior results compared with other state-of-the-art methods.

Full Text