Abstract
Formant-based synthetic speech is less robust in noise than natural speech and is often criticized as ‘‘robot-like.’’ One reason may be the failure to model nonlocal spectral variation induced by segments in other syllables. This study investigates how a single consonant can affect vowel formant frequencies in nonadjacent as well as adjacent syllables. Sequences of /schwa u i schwa/ with intervening consonants were embedded in carrier phrases to give quasimeaningful sentences. The medial consonant was /z/ or /r/. Flanking consonants were either all /b/, or /d/ in the syllables before and after the medial consonant (/bschwabu{zr}ibschwab/, /bschwadu{zr}idschwab/). Primary stress was on either the second syllable, or the first and third. Steady-state or mid-point frequencies of F1–F3 were measured from LPC spectra. Preliminary results confirm expectations: /r/ engenders lower formant frequencies than /z/ in less-stressed, all-b contexts, even in nonadjacent syllables. Vowel stress and /d/ contexts tend to block the spread of differences due to /z/ or /r/, presumably because the tongue is more constrained. Many of these differences are audible; their contribution to robustness and naturalness of synthetic speech will be described. [Work partly supported by Telia Promotor Infovox AB.]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.