In a method and apparatus which use actual speech as auxiliary information and synthesize speech by speech synthesis by rule, prosodic information for a phoneme sequence of each word of a word sequence obtained by an analysis of an input text is set by referring to a word dictionary, and a speech waveform sequence is obtained from the phoneme sequence of each word by referring to a speech waveform dictionary. Additional prosodic information is extracted from input actual speech, and at least one of the set prosodic information or at least one of the extracted prosodic information is selected and used to control the speech waveform sequence to create synthesized speech.
Read full abstract