Abstract

Experiments indicate that non-nasal obstruents in human utterances can be replaced by ‘‘surrogate’’ segments, either produced by formant synthesis or recorded from other speakers, with virtually no change in speech quality or speaker identity [Hertz, Proc. IEEE 2002 Workshop on Speech Synthesis (2002)]. While the durational and spectral properties of the surrogate segments must be broadly appropriate to their target context, no speaker-specific tailoring is required. This paper describes follow-on experiments studying the perceptual consequences of replacing nasal consonants in human utterances with surrogate segments from different phonetic contexts, either synthesized or spoken by other speakers. These experiments indicate that the manipulated speech sounds natural when surrogate segment durations, and the formant transitions and nasalization characteristics of adjacent vowels, are appropriate. In certain contexts F0 is also perceptually salient. The spectral characteristics of surrogate nasal murmurs are often unimportant. In many cases, the perceived speech quality, phoneme identity, and speaker identity are unaffected even by a surrogate from a phoneme differing from the original. This paper highlights the perceptual results and explains their relevance to hybrid synthesis techniques that employ cross-speaker waveform concatenation and/or integrate waveform concatenation with formant synthesis. Utterances that exemplify these results will be played.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call