A method for generation of Mandarin F0 contours based on tone nucleus model and superpositional model

Qinghua Sun,Keikichi Hirose,Nobuaki Minematsu

doi:10.1016/j.specom.2012.03.005

Abstract

A new method was proposed for synthesizing sentence fundamental frequency (F0) contours of Mandarin speech. The method is based on representing a sentence logarithmic F0 contour as a superposition of tone components on phrase components, as in the case of the generation process model (F0 model). However, the method is not fully depending on the model in that tone components are generated in a corpus-based way by concatenating F0 patterns predicted for constituting syllables. Furthermore, the prediction is done only for the stable part of syllable tone component, known as tone nucleus. The entire tone components were obtained by concatenating the predicted patterns. Since effect of tone coarticulation is minor for tone nuclei, as compared to conventional methods of handling full syllable F0 contours, a better prediction is possible especially when the size of training corpus is limited. While tone components are highly language specific, phrase components are assumed to be more language universal: analogy from a control scheme of phrase components developed for a language may applicable for other languages. Also, phrase components covers a wider range (phrase, clause, etc.) of speech and is tightly related to higher linguistic information (syntax), and, therefore, concatenation of short F0 contour fragments predicted in a corpus-based method will not be appropriate. Taking these into consideration, rules similar to Japanese were constructed to control phrase commands, from which phrase components were generated with simple mathematical calculations in the framework of the generation process model. There is a tight relation between phrase and tone components, and, therefore, both components cannot be generated independently. To ensure the correct relation be held in the synthesized F0 contour, a two-step scheme was developed, where information of generated phrase components was utilized for the prediction of tone components. A listening test was conducted for speech synthesized using F0 contours generated by the developed method. Synthetic speech sounded highly natural, showing the validity of the method. Furthermore, it was shown through an experiment of word emphasis that flexible F0 control was possible by the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A method for generation of Mandarin F0 contours based on tone nucleus model and superpositional model

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Apr 6, 2012
Citations: 10

Similar Papers

Corpus-based synthesis of Mandarin speech with F<inf>0</inf> contours generated by superposing tone components on rule-generated phrase components
Keikichi Hirose ... Qinghua Sun
-
Keikichi Hirose, et. al.Keikichi Hirose ... Qinghua Sun
01 Dec 2008
01 Dec 2008

Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model
Qinghua Sun ... Nobuaki Minematsu
-
Qinghua Sun, et. al.Qinghua Sun ... Nobuaki Minematsu
04 Sep 2005
04 Sep 2005

Tone nucleus modeling for Chinese lexical tone recognition
Jinsong Zhang ... Keikichi Hirose
Speech Communication | VOL. 42
Jinsong Zhang, et. al.Jinsong Zhang ... Keikichi Hirose
24 Jan 2004
Speech Communication | VOL. 42

Synthesis of fundamental frequency contours of a Japanese sentence using the junction rule of the phrase component
Masahiro Okada ... Shinji Ozawa
The Journal of the Acoustical Society of America | VOL. 76
Masahiro Okada, et. al.Masahiro Okada ... Shinji Ozawa
01 Oct 1984
The Journal of the Acoustical Society of America | VOL. 76

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A method for generation of Mandarin F0 contours based on tone nucleus model and superpositional model

Abstract

Talk to us

Similar Papers

More From: Speech Communication