Method and apparatus for speech synthesis whereby waveform segments expressing respective syllables of a speech item are modified in accordance with rhythm, pitch and speech power patterns expressed by a prosodic template

Toshimitsu Minowa

doi:10.1121/1.1601088

Abstract

A method and apparatus for speech synthesis utilize a plurality of stored prosodic templates, each having been generated based on a series of enunciations of a single syllable executed in accordance with the rythm, pitch and speech power variations of an enunciated sample speech item, whereby the templates express rythm, speech power and pitch characteristics of respectively different sample speech items. Data representing an object speech item are converted to a sequence of acoustic waveform segments which respectively express the syllables of the speech item, the number of morae (syllable intervals) and the accent type of the speech item are judged and a prosodic template having the same number of morae and accent type is selected, and waveform shaping is applied to the waveform segments such as to match the rythm, speech power and pitch characteristics of the object speech item to those expressed by the selected prosodic template. The shaped acoustic waveform segments are then linked to form a continuous acoustic waveform, thereby obtaining synthesized speech which closely resembles natural speech.

Full Text