Abstract

We describe feature space and model space discriminative training for a new class of acoustic models called Bayesian sensing hidden Markov models (BS-HMMs). In BS-HMMs, speech data is represented by a set of state-dependent basis vectors. The relevance of a feature vector to different bases is determined by the precision matrices of the sensing weights. The basis vectors and the precision matrices of the reconstruction errors are jointly estimated by optimizing a maximum mutual information (MMI) criterion. Additionally, we discuss the training of an fMPE-style discriminative feature transformation under the same criterion given these models. Experimental results on an LVCSR task show that the proposed models outperform discriminatively trained conventional HMMs with Gaussian mixture models (GMMs). Cross-adapting the baseline GMM-HMMs to the BS-HMM output yields a 6% relative gain which indicates that the two systems make different errors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call