The k-nearest neighbor technique with local linear regression

Steen Magnussen,Erkki Tomppo

doi:10.1080/02827581.2013.878744

Steen Magnussen, Erkki Tomppo

https://doi.org/10.1080/02827581.2013.878744

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In a standard k-nearest neighbor (kNN) technique, imputations of unit-level values in the variables of interest (Y) are based on the k-nearest neighbors in a set of reference units. Nearest is defined with respect to a distance metric in the space of auxiliary variables (X). This study evaluates kNN imputations of Y with a selection, by the same distance metric, of k-nearest locally weighted regression models. Imputations are obtained as predictions using the X values of the k-nearest neighbors in the population. In simulated random sampling from three artificial multivariate populations and two actual univariate populations and sampling units composed of a single population element or a cluster of four elements, the new kNN technique: (1) improved the correlation between an imputation and its actual value; (2) lowered the root mean square error (RMSE) of imputations; (3) increased the slope in regressions of actual y values regressed against their imputed values; (4) performed relatively best with k values of 4 and sample sizes of 200 or greater; (5) compared favorably with a recently proposed kNN calibration procedure; and (6) had a higher (15–28%) RMSE than with a simple local linear regression. Distribution matching had a consistent negative effect (+10%) on RMSE.

Full Text