Speech Synthesis Method And Speech Synthesizer

Takahiro Kamai,Yumiko Kato

doi:10.1121/1.3582212

Abstract

A language processing portion ( 31 ) analyzes a text from a dialogue processing section ( 20 ) and transforms the text to information on pronunciation and accent. A prosody generation portion ( 32 ) generates an intonation pattern according to a control signal from the dialogue processing section ( 20 ). A waveform DB ( 34 ) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion ( 33 ) cuts desired pitch waveforms from the waveform DB ( 34 ). A phase operation portion ( 35 ) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion ( 33 ), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section ( 20 ). The thus-produced pitch waveforms are placed at desired intervals and superimposed.

Full Text