Abstract

Missing data problems can be seen in almost every field like biology, medicine, sensor networks, survey, etc. Most of the existing algorithms that are used to impute missing data rely on the assumption that there exist some complete data rows. This paper presents a modified deep autoencoder model that can impute even for the case where there are not any complete rows, i.e., a block of data missing. The system is tested in three different datasets: multivariate Gaussian distribution samples, real-world wireless sensor network dataset, and abalone dataset with different percentages of the block of missing values in all the datasets. The performance of the proposed deep autoencoder is compared against k-nearest neighbor (KNN)-based imputation method and mean imputation method, using root mean square error (RMSE) as a performance metric. The results indicate that the proposed system outperforms commonly used mean imputation method. Moreover, for the case having a large sample of datasets and having a good correlation between variables, the deep autoencoder-based imputation method is able to achieve better results than KNN and mean imputation methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call