Non-linear missing data imputation for healthcare data via index-aware autoencoders.

Sadaf Kabir,Leily Farrokhvar

doi:10.1007/s10729-022-09597-1

Abstract

The availability of data in the healthcare domain provides great opportunities for the discovery of new or hidden patterns in medical data, which can eventually lead to improved clinical decision making. Predictive models play a crucial role in extracting this unknown information from data. However, medical data often contain missing values that can degrade the performance of predictive models. Autoencoder models have been widely used as non-linear functions for the imputation of missing data in fields such as computer vision, transportation, and finance. In this study, we assess the shortcomings of autoencoder models for data imputation and propose modified models to improve imputation performance. To evaluate, we compare the performance of the proposed model with five well-known imputation techniques on six medical datasets and five classification methods. Through extensive experiments, we demonstrate that the proposed non-linear imputation model outperforms the other models for all degrees of missing ratios and leads to the highest disease classification accuracy for all datasets.

Full Text