Abstract
The acoustic realization of vowels with lexical stress generally differs substantially from their unstressed counterparts, which are more reduced in spectral quality, shorter in duration, weaker in intensity and tend to have a flatter spectral tilt. Therefore, in an automatic speech recognizer it would appear profitable to train separate models for the stressed and unstressed variants of each vowel. A problem is how to define the mapping from the theoretical stress of words to the actual realization of stress in fluent speech. We compared several hypotheses about this mapping applied in both training and testing of the recognizer. The recognition results on an independent test-set showed that recognition rates did not increase by this use of stress in our ASR. Possible explanations are discussed and future research plans are outlined.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have