Exact and approximate algorithms for variable selection in linear discriminant analysis

Michael J Brusco,Douglas Steinley

doi:10.1016/j.csda.2010.05.027

Abstract

Variable selection is a venerable problem in multivariate statistics. In the context of discriminant analysis, the goal is to select a subset of variables that accomplishes one of two objectives: (1) the provision of a parsimonious, yet descriptive, representation of group structure, or (2) the ability to correctly allocate new cases to groups. We present an exact (branch-and-bound) algorithm for variable selection in linear discriminant analysis that identifies subsets of variables that minimize Wilks’ Λ . An important feature of this algorithm is a variable reordering scheme that greatly reduces computation time. We also present an approximate procedure based on tabu search, which can be implemented for a variety of objective criteria designed for either the descriptive or allocation goals associated with discriminant analysis. The tabu search heuristic is especially useful for maximizing the hit ratio (i.e., the percentage of correctly classified cases). Computational results for the proposed methods are provided for two data sets from the literature.

Full Text