Abstract
The acoustic realization of vowels with lexical stress generally differs substantially from their unstressed counterparts, which are more reduced in spectral quality, shorter in duration, weaker in intensity and tend to have a flatter spectral tilt. Therefore, in a continuous speech recognizer (CSR) it would appear profitable to train separate models for the stressed and unstressed variants of each vowel. In the experiments reported on here, we applied stress modeling in both training and testing of the recognizer. Recognition experiments on an independent test set showed that recognition rates did not improve by this use of stress in our CSR. However, if we swapped the stress markers in the recognition lexicon the recognition rates did significantly deteriorate. This demonstrated that the acoustic models for the stressed and unstressed variants of the vowels were different. A pitfall in this experiment was that lexical stress information and phonemic context were possibly confounded. In a follow-up experiment we controlled for context by using generalized context-dependent models. In this experiment the recognition results were not improved either, although the vowel models were better tailored to capture lexical stress-related information. We conclude that the mapping of lexical stress to the acoustic surface of fluent speech is not sufficiently straightforward to be of direct benefit for CSR, due to interaction of lexical stress with rhythm and sentence accent in real speech.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have