Generating intonation with a dynamic lexicon of prosodic prototypes

Veronique Auberge

doi:10.1121/1.409115

Abstract

A model of intonation is described that uses hierarchically stored intonation contours and inferences about grammatical structure to generate complex intonation patterns for text-to-speech (TTS). The model makes two assumptions. First, intonational structure shares points of articulation with other linguistic structures—morphological, syntactic, and semantic—even though these structures may be quite different. The common nodes in intonational and other structures are called ‘‘rendez-vous.’’ Second, intonation contours, whose boundaries are marked by the rendez-vous, can be characterized by prosodic prototypes. Intonation is thus generated from the lexicon of prototypes, where parameters describing each prototype are defined by the rendez-vous. The lexicon itself is structured according to a rendez-vous hierarchy; it contains syllabic and supra-syllabic units associated with classes of F0 movements. For TTS, intonation generation is linked with a text parser that is able to deliver rendez-vous. A rule-governed multi-agent system allows interactive exchanges between the intonation generation module and the text analysis module. Interactions among the different language components (morphology, syntax, semantics) allow intonation parsing to facilitate the resolution of some structural ambiguities. In such a system, text analysis can be readily adapted for intonation generation in a variety of text and reading styles.

Full Text