Abstract

We study the use of a generalized multilayer perceptron (MLP) architecture to tandem feature extraction. In the tandem feature extraction scheme an MLP with a softmax output layer is discriminatively trained to estimate phoneme posterior probabilities on a labeled database. The outputs of the MLP after nonlinear transformation and whitening are used as features in a Gaussian mixture model (GMM) based speech recognizer. We consider three layer MLPs with a linear output layer. They nonlinearly transform the input data to a higher dimensional space defined by the output of hidden units and perform linear discriminant analysis (LDA) on the hidden unit outputs. We compare the performances of these features with the direct application of LDA on input data, which is equivalent to MLP with linear hidden and output layers. The tandem features outperform those obtained from LDA and linear output MLPs on a connected digit recognition task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call