Abstract
In a standard k-nearest neighbor (kNN) technique, imputations of unit-level values in the variables of interest (Y) are based on the k-nearest neighbors in a set of reference units. Nearest is defined with respect to a distance metric in the space of auxiliary variables (X). This study evaluates kNN imputations of Y with a selection, by the same distance metric, of k-nearest locally weighted regression models. Imputations are obtained as predictions using the X values of the k-nearest neighbors in the population. In simulated random sampling from three artificial multivariate populations and two actual univariate populations and sampling units composed of a single population element or a cluster of four elements, the new kNN technique: (1) improved the correlation between an imputation and its actual value; (2) lowered the root mean square error (RMSE) of imputations; (3) increased the slope in regressions of actual y values regressed against their imputed values; (4) performed relatively best with k values of 4 and sample sizes of 200 or greater; (5) compared favorably with a recently proposed kNN calibration procedure; and (6) had a higher (15–28%) RMSE than with a simple local linear regression. Distribution matching had a consistent negative effect (+10%) on RMSE.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have