This paper deals with imputation techniques and strategies. Usually, imputation truly commences after the first data editing, but many preceding operations are needed before that. In this editing step, the missing or deficient items are to be recognized and coded, and then it is decided which of these, if any, should be substituted by imputing. There are a number of imputation methods and their specifications. Consequently, it is not clear what method finally should be chosen, especially when an imputation method may be best in one respect, and another method in the other. In this paper, we consider these questions through the following four imputation methods: (i) random hot decking, (ii) logistic regression imputation, (iii) linear regression imputation, and (iv) regression-based nearest neighbour hot decking. The last two methods are applied with the two different specifications. The two metric variables have been used in empirical tests. The first is very complex, but the second is more ordinary, and thus easier to handle. The empirical examples are based on simulations, which clearly show the biases of the various methods and their specifications. In general, it seems that method (iv) is recommendable although the results from it are not perfect either.
Read full abstract