Variable selection strategies for nearest neighbor imputation methods used in remote sensing based forest inventory

Petteri Packalén,Hailemariam Temesgen,Matti Maltamo

doi:10.5589/m12-046

Abstract

We examined the problem of selecting predictor variables for Nearest Neighbor (NN) imputation in remote sensing based forest inventory. Eighty-three variables were calculated from Airborne Laser Scanning data and aerial images, with responses being either dominant height or a set of five common stand attributes. Three different approaches were compared with select predictor variables. Analyses were repeated with three different NN imputation methods using a varying number of predictor variables. Results indicated that variable selection is justified, but it must be done properly. The most accurate method to select predictors was to minimize error using Simulated Annealing. For a single response, the most accurate imputation method was Random Forest proximity matrix-based imputation, whereas Most Similar Neighbor was the most accurate for five responses. An optimization-based distance metric also worked well. We also examined the degree to which different imputation methods are prone to overfitting as well as how to properly do cross-validation in NN imputation.

Full Text