Partitioning the feature space of a classifier with linear hyperplanes

M. Padmanabhan,L.R. Bahl,D. Nahamoo

doi:10.1109/89.759035

Abstract

We describe the design and use of linear hyperplanes to partition the feature space of a classifier. The objective of the partitioning is to minimize the average entropy of the class distribution in the final partitions. The hyperplanes are characterized by a vector /spl nu//sub n/ and scalar h/sub n/, which are computed with the objective of maximizing the mutual information associated with the partitioning. We show that the problem of designing the /spl nu//sub n/ can be simplified to an approximately equivalent problem, the solution of which, interestingly enough, turns out to be the linear discriminant of the data. We also describe a decision-tree based technique that partitions the feature space in a hierarchical fashion where each node in the decision tree represents a linear hyperplane that further partitions the feature space into two regions. The end result of the tree-growing process is that the entire feature space is partitioned into nonoverlapping regions where each region is bounded by a number of hyperplanes. As the criterion for the design of the hyperplanes is the minimization of the average class entropy of the regions, each region is characterized by the occurrence of only a small number of classes. The partitioning information provided by the decision-tree can then be used to eliminate a number of classes from being considered, and hence simplifies the job of the classifier. We show the application of the decision-tree as a preprocessor to a classifier in the speech recognition problem. Here the classes are modeled with mixtures of tens to hundreds of thousands of Gaussians, and the use of the partitioning information reduces the computation associated with the classification process by factors larger than 20 with negligible degradation in the word error rate (note, however, that the overall decoding process comprises other steps in addition to the classification-the computational speedup mentioned above does not affect the computation of these other steps). We also compare the performance of this quantization scheme with standard Gaussian clustering schemes and show that the decision-tree based quantization provides better performance in most cases.

Full Text