An N-best candidates-based discriminative training for speech recognition applications

Jung-Kuei Chen Jung-Kuei Chen,F.K Soong

doi:10.1109/89.260363

Abstract

The authors propose an N-best candidates-based discriminative training procedure for constructing high-performance HMM speech recognizers. The algorithm has two distinct features: N-best hypotheses are used for training discriminative models; and a new frame-level loss function is minimized to improve the separation between the correct and incorrect hypotheses. The N-best candidates are decoded based on their recently proposed tree-trellis fast search algorithm. The new frame-level loss function, which is defined as a halfwave rectified log-likelihood difference between the correct and competing hypotheses, is minimized over all training tokens. The minimization is carried out by adjusting the HMM parameters along a gradient descent direction. Two speech recognition applications have been tested, including a speaker independent, small vocabulary (ten Mandarin Chinese digits), continuous speech recognition, and a speaker-trained, large vocabulary (5000 commonly used Chinese words), isolated word recognition. Significant performance improvement over the traditional maximum likelihood trained HMMs has been obtained. In the connected Chinese digit recognition experiment, the string error rate is reduced from 17.0 to 10.8% for unknown length decoding and from 8.2 to 5.2% for known length decoding. In the large vocabulary, isolated word recognition experiment, the recognition error rate is reduced from 7.2 to 3.8%. Additionally, they have found that using more relaxed decoding constraints in preparing N-best hypotheses yields better recognition results. >

Full Text