Nonlinear Fisher discriminant using kernels

O Guttman,R Meir

doi:10.1109/eeei.2000.924384

Abstract

Our research addresses the supervised learning problem. Informally stated, the supervised learning problem is defined as follows: given a class-tagged training dataset (e.g. a set of vectors with accompanying class tags), construct a classifier which given an untagged vector will predict its class membership. The classifier performance is measured in terms of the generalization ability, the capacity to correctly classify untagged samples. Supervised learning techniques are applicable to a wide range of engineering problems. Speech recognition, handwritten character recognition, medical diagnostics and data mining tasks are among the more obvious candidate problems for supervised learning strategies. Early efforts toward solving the supervised learning problem included linear methods such as the least squares linear classifier, the Fisher linear discriminant, and perceptron learning algorithms. These approaches shared an inherent inflexibility owing to their linear nature. Advances in available computer speed and storage capacity have sparked a renewed interest in supervised learning, witnessed by a wide range of innovations. Notably, neural networks, support vector machines (SVMs), and decision trees offer state-of-the-art, nonlinear classification performance. Boosting techniques, which aim to iteratively improve classification performance by redistributing each training sample's weight in the training phases, have been demonstrated to be very effective. Our contribution aims to devise algorithms that share the classical approaches' theoretical elegance and closed form analytic solution with the flexibility offered by modern nonlinear approaches. We generalize the linear decision boundaries offered by Fisher's linear discriminant (FLD) algorithm using kernel functions. The primary design goals are: (i) the ability to naturally deal with data that is not linearly separable as algorithm does not require a user specified regularization parameter for penalizing misclassifications; and (ii) modest computational load, and by relying on matrix inversion, it is capable of handling very large training datasets.

Full Text