Abstract

Standard Hidden Markov Model (HMM) assumes that successive observations are independent to one another given the state sequence. This leads to a poor trajectory model for speech. Many explicit trajectory modeling techniques have been studied in the past to improve trajectory modeling for HMM. However, these techniques do not yield promising improvements over conventional HMM systems where differential parameters and Gaussian Mixture Model have been used implicitly to circumvent the poor trajectory modeling issue of HMM. Recently, semi-parametric trajectory modeling techniques based on temporally varying model parameters such as fMPE and pMPE have been shown to yield promising improvements over state-of-the-art systems on large vocabulary continuous speech recognition tasks. These techniques use high dimensional posterior features derived from a long span of acoustic features to model temporally varying attributes of the speech signal. Bases corresponding to these posterior features are then discriminatively estimated to yield temporally varying mean (fMPE) and precision matrix (pMPE) parameters. Motivated by the success of fMPE and pMPE, Temporally Varying Weight Regression (TVWR) was recently proposed to model HMM trajectory implicitly using time-varying Gaussian weights. In this paper, a complete formulation of TVWR is given based on a probabilistic modeling framework. Parameter estimation formulae using both maximum likelihood (ML) and minimum phone error (MPE) criteria are derived. Experimental results based on the Wall Street Journal ( CSR-WSJ0 + WSJ1) and Aurora 4 corpora show that consistent promising improvements over the standard HMM systems can be obtained in both the 20 k open vocabulary recognition task (NIST Nov'92 WSJ0) and 5 k closed vocabulary noisy speech recognition for both ML and MPE criteria.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call