Synthesis of names using careful speech style

Anthony Bladon

doi:10.1121/1.404428

Abstract

A high-quality formant synthesizer, optimized for the transfer characteristics of the telephone network, has been implemented to provide real-time text-to-speech performance on a fraction of a TMS 320C25 processor. A version optimized for the pronunciation of American names (‘‘Name-to-Speech’’) has been created which incorporates phonetic level changes characteristic of acarefulspeech style. Examples of the careful speech adaptations include a brief inter-word silence, a wider use of vowel-onset glottal stop, the restoration of plosive releases and of various elisions, the strengthening of some weak vowels, stress changes, reductions in formant coarticulation, and various segmental acoustic adjustments such as fricative gain. These careful speech features were determined after observing corresponding behaviors in an initial study of the way telephone users speak their own names (for the purpose of annotating their voice mail). In a follow-up study, users showed an overwhelming preference for the pronunciation of their own name in the careful style of synthesis, rather than in a fluent style typical of synthesized running text.

Full Text