Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection

Takahiro Ohtsuka,Chang-Sheng Yang,Hideki Kasuya

doi:10.1121/1.422241

Takahiro Ohtsuka, Chang-Sheng Yang + Show 1 more

https://doi.org/10.1121/1.422241

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

A formant type speech synthesis method has an enormous advantage in that it allows one to generate speech with various voice quality variations and talker individualities. But it has suffered from unnatural speech quality, not because of theoretical limitations but because of an incomplete set of rules for the synthesis. Any insufficient approximation to the acoustics causes degradation of the perceived quality of synthetic speech. A novel formant type speech synthesizer in Japanese based on concatenation of CV (consonant-vowel) formant-source templates obtained from natural utterances has been investigated, in which multiple sets of formant and voice source parameter values are used for each of the CV syllables. This paper describes an automatic method to create the CV formant-source templates from speech corpus. The ARX (autoregressive with exogenous input) analysis method is first used to automatically extract formant and voice source parameters and then an HMM based segmentation is performed to locate the CV segments. The segments are further analyzed to detect a starting point of the syllable. A distance measure is used to decide the number of templates needed for each of the syllables. The method is proved to be useful in creating CV templates by experiments performed on 503 Japanese sentences.

Full Text