Abstract
A formant type speech synthesis method has an enormous advantage in that it allows one to generate speech with various voice quality variations and talker individualities. But it has suffered from unnatural speech quality, not because of theoretical limitations but because of an incomplete set of rules for the synthesis. Any insufficient approximation to the acoustics causes degradation of the perceived quality of synthetic speech. A novel formant type speech synthesizer in Japanese based on concatenation of CV (consonant-vowel) formant-source templates obtained from natural utterances has been investigated, in which multiple sets of formant and voice source parameter values are used for each of the CV syllables. This paper describes an automatic method to create the CV formant-source templates from speech corpus. The ARX (autoregressive with exogenous input) analysis method is first used to automatically extract formant and voice source parameters and then an HMM based segmentation is performed to locate the CV segments. The segments are further analyzed to detect a starting point of the syllable. A distance measure is used to decide the number of templates needed for each of the syllables. The method is proved to be useful in creating CV templates by experiments performed on 503 Japanese sentences.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.