A comparison of discriminant procedures for binary variables

Ognian K Asparoukhov,Wojtek J Krzanowski

doi:10.1016/s0167-9473(01)00032-9

Abstract

Thirteen discriminant procedures are compared by applying them to five real sets of binary data and evaluating their leave-one-out error rates. Three versions of each data set have been used, containing respectively “large”, “moderate” and “small” numbers of variables. To achieve the latter two categories, reduction of variables was first carried out using the all-subsets approach based on Kullback's information divergence measure. Sample size, number of non-empty multinomial cells and Empirical Integrated Rank are taken into account in assessment of classifier effectiveness. While the data sets are ones that arose during day-to-day statistical consulting, the empirical basis for drawing widespread conclusions is inevitably limited. Nevertheless, the study did highlight the following interesting features. The Kernel, Fourier and Hall's k-nearest neighbour classifiers had a tendency to overfit the data. The mixed integer programming classifier was clearly better than the other linear classifiers, and linear discriminant analysis had better results than logistic discrimination especially for small sample sizes. The second-order Bahadur procedure was generally very effective when the number of variables was large, but only if the sample size was large when the number of variables was small. The second-order log-linear models were very effective when the number of variables was small or when the sample sizes were large. Quadratic discrimination and Hills’ k-nearest neighbour classification both performed poorly. The traditional statistical classifiers did not cope well with sparse binary data; the non-traditional classifiers such as neural networks or mixed integer programming classifiers were much better in such circumstances.

Full Text