Abstract

In this talk, the issue of acoustic modeling of the fundamental units for recognition is examined. Modeling approaches for both context independent (CI) and context dependent (CD) units are studied. The acoustic modeling approaches were tested on speaker independent recognition of the DARPA Naval Resource Management Task. The set of context independent units in this study is a fixed set of 47 phonelike units (PLUs), in which each PLU is associated with a linguistically defined phoneme symbol. Each CI/PLU is modeled using a continuous density hidden Markov model (CDHMM) with Gaussian mixture state observation density. The set of context dependent units includes PLUs defined by left, right, and both left and right context. Only those CD/PLUs with enough occurrences in the training data are selected for modeling. Two approaches are presented to model the CD/PLUs. Both CI/PLUs and CD/PLUs are obtained using the segmental k-means training procedure. In the case of context independent acoustic modeling, the maximum number of mixtures was varied in each state from 1 to 256 and it was found that the word accuracy increased from 56% to 89% which indicates that sufficient acoustic resolution is essential for improved performance. The 89% word accuracy is the highest performance reported based on context independent units. When context dependency modeling is incorporated, both modeling approaches achieved better than 92% word accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call