A novel objective function for improved phoneme recognition using time-delay neural networks

J.B Hampshire,A.H Waibel

doi:10.1109/72.80233

Abstract

Single-speaker and multispeaker recognition results are presented for the voice-stop consonants /b,d,g/ using time-delay neural networks (TDNNs) with a number of enhancements, including a new objective function for training these networks. The new objective function, called the classification figure of merit (CFM), differs markedly from the traditional mean-squared-error (MSE) objective function and the related cross entropy (CE) objective function. Where the MSE and CE objective functions seek to minimize the difference between each output node and its ideal activation, the CFM function seeks to maximize the difference between the output activation of the node representing incorrect classifications. A simple arbitration mechanism is used with all three objective functions to achieve a median 30% reduction in the number of misclassifications when compared to TDNNs trained with the traditional MSE back-propagation objective function alone.

Full Text