Abstract

Nearest neighbour discriminant analysis is compared under cross-validation with a tree-based classification technique, for discrimination tasks involving two classes and Cauchy-distributed error on the independent variables. It is shown that, for the tasks employed, as the number of variables is increased, performance of the nearest neighbour algorithm declines, whereas that of the tree-based technique improves. Above six or seven variables, the tree-based method shows superior discrimination power. The results are explained in terms of the nearest neighbour technique being liable to overfitting when large numbers of variables are used, in contrast to the tree-based technique, which incorporates protection against overfitting by branching on only a subset of the independent variables.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call