Abstract

Abstract Imputation procedures are frequently used to treat nonresponse. With random hot deck imputation, missing values are replaced by valid observed values from other units in the same dataset. The recently developed balanced nearest neighbor imputation method, implemented in the SwissCheese R package, generates random hot deck imputation under certain balancing constraints to decrease the variance of the total estimator, in the presence of multivariate nonresponse. The method relies on a notion of neighborhood between units, utilizing a distance measure that becomes difficult to define in high dimensions. In contrast to hot deck imputation methods, many imputation procedures obtain replacement values from prediction models fit from observed data. The missForest method, which uses random forests as prediction models, is an example of this approach. In this article, we propose a new approach that uses the two methods in a complementary manner. We refine the distance measure in the SwissCheese method using missForest predictions. Through a simulation study on empirical data from the Swiss Survey on Income and Living Conditions, we demonstrate reductions in Monte Carlo variance, bias, and mean squared error of the totals obtained by our proposed imputed estimator compared to those obtained using SwissCheese alone.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.