Abstract

We apply the PDF projection theorem to generalize the hidden Markov model (HMM) to accommodate multiple simultaneous segmentations of the raw data and multiple feature extraction transformations. Different segment sizes and feature transformations are assigned to each state. The algorithm averages over all allowable segmentations by mapping the segmentations to a “proxy” HMM and using the forward procedure. A by-product of the algorithm is the set of a posteriori state probability estimates that serve as a description of the input data. These probabilities have simultaneously the temporal resolution of the smallest processing windows and the processing gain and frequency resolution of the largest processing windows. The method is demonstrated on the problem of precisely modeling the consonant “T” in order to detect the presence of a distinct “burst” component. We compare the algorithm against standard speech analysis methods using data from the TIMIT corpus.

Highlights

  • The Hidden Markov Model (HMM) [1] combined with spectral analysis using cepstral coefficients [2] on fixedlength analysis windows remains at the forefront of automatic speech recognition (ASR) technology

  • The need for a fixed-size window arises from the fundamental probabilistic approach that underlies the method and depends on the comparison of likelihood functions formed on a common feature space

  • One could not directly compare two likelihood functions if they are defined on different feature spaces

Read more

Summary

INTRODUCTION

The Hidden Markov Model (HMM) [1] combined with spectral analysis using cepstral coefficients [2] on fixedlength analysis windows remains at the forefront of automatic speech recognition (ASR) technology. The value of L(X) calculated by the forward procedure operating on Pt,fq changes, it remains a valid joint PDF of X We know this because all we have done is replace the the conditional PDFs P(X|Q) assuming all the segments are independent with another PDF that assumes statistical dependence within the wait state sequences associated with a given state. At this point we have a raw-data based MRHMM model that we can compute efficiently using the forward procedure operating on Pt,fq. Let p(zs|s) be a PDF estimate of the feature set zs based on training data from state s. J(x; Ts, H0,s) has a simple form based on the Fisher’s information matrix [6]

PRACTICAL IMPLEMENTATION DETAILS
Slave Partitions
Efficient Implementation
Simulated Data
50 Time step
Speech Data
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call