A comparison of nearest neighbour and tree-based methods of non-parametric discriminant analysis

W.Z Liu,A.P White

doi:10.1080/00949659508811694

Abstract

Nearest neighbour discriminant analysis is compared under cross-validation with a tree-based classification technique, for discrimination tasks involving two classes and Cauchy-distributed error on the independent variables. It is shown that, for the tasks employed, as the number of variables is increased, performance of the nearest neighbour algorithm declines, whereas that of the tree-based technique improves. Above six or seven variables, the tree-based method shows superior discrimination power. The results are explained in terms of the nearest neighbour technique being liable to overfitting when large numbers of variables are used, in contrast to the tree-based technique, which incorporates protection against overfitting by branching on only a subset of the independent variables.

Full Text