We consider the classification problem in the context of two equiprobable multivariate Gaussian densities with a common covariance matrix. The use of Anderson's W statistic for classification results in the existence of an optimum number of features, p opt such that, for a given sample size, the average probability of misclassification decreases at first as the number of features are increased, attains a minimum at p opt and then starts increasing. We have examined this peaking phenomenon for several cases and provide expressions which relate p opt to the number of available training samples and the Mahalanobis distance between the two populations. We also show that to prevent peaking, each additional feature's contribution to the Mahalanobis distance must be a certain proportion of the accumulated Mahalanobis distance.
Read full abstract