Abstract

Theories of vowel recognition based on the vowel formant frequencies (or spectral peaks) are less accurate in identifying vowels than human listeners. This has been true even if the model uses dynamic spectral information such as the changes in the spectral peaks over time [H. Houde (2002)]. Studies with synthetic speech and edited natural speech have implicated duration and fundamental frequency as additional perceptual cues. The present work focused on duration, fundamental frequency, and spectral shape using synthetic vowels modeled on the speech of a male and a female talker. Series from /i/ to /I/, /E/ to /ae/, /uh/ to /a/, and /U/ to /u/ were created in which only duration, fundamental frequency, or the shape of the short‐term spectrum were varied. Subsequent studies manipulated the formant frequencies and the duration or spectral shape orthogonally. Results indicated that the duration and spectral shape of a steady state vowel exert a potent influence on vowel identification, while varying the fundamental frequency appeared to have little effect.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call