Delta streams for English synthesis

Susan R Hertz

doi:10.1121/1.2025342

Abstract

Although speech patterns are heavily influenced by the hierarchical linguistic structure of speech, synthesis by rule systems have generally been based on linear utterance representations. Delta is a new programing language that makes it easy to work with utterance representations containing multiple levels. Both higher level linguistic units, such as phrases, syllables, and phonemes, and lower level phonetic events, such as articulatory or formant targets and F0 trends, can be easily accommodated on separate interconnected “streams,” with each unit equally accessible to the rules. While Delta can be used to test most any synthesis model for any language, this paper will show how Delta can be used to test a particular model for English. This model uses, among others, CV, syllable, nucleus, phoneme, formant, and duration streams, with formant transitions represented as duration tokens that are in effect invisible in other streams. The paper will justify the selection of streams and the unique way of handling formant transitions, demonstrating in Delta notation how the model leads to particularly straightforward rules for predicting English phoneme durations, formant values, and aspiration patterns.

Full Text