A Comparison of the Effects of Data Imputation Methods on Model Performance

Wooyoung Kim,Jaegul Choo,Cheonbok Park,Jangho Choi,Wonwoong Cho,Jiyong Kim

doi:10.23919/icact.2019.8702000

Abstract

Missing values cause critical problems on training a prediction model. Various missing data imputation methods have been introduced to settle down the problem. However, the imputation accuracy obtained by the methods is insufficient to validate performance of prediction models. Thus, in this study, we compare (1) imputation accuracy from various imputation methods as well as (2) the effects of imputation methods on prediction accuracy, investigating a relationship between imputation accuracy and prediction accuracy. For the comparison, we use water quality data composed of the latest actual observational multi-sensor data from Daecheong Lake. We conduct several experiments to compare seven imputation methods including a state of the art method, and their effects on three distinct prediction models. Through quantitative comparison and analysis, we proved that it is necessary to consider both imputation accuracy and model prediction accuracy when choosing an imputation method.

Full Text