On the model of global F0 shape for Japanese text-to-speech systems

Yasushi Ishikawa,Kunio Nakajima,Takashi Ebihara

doi:10.1121/1.416765

Abstract

A model of F0 control is one of the most important problems for the naturalness of synthesized speech in Japanese TTS systems. In general, a two-stage model which consists of a global model and a local model is used as a Japanese F0 control model. A local model generatesF0 contour for each accent phrase, a global model generates parameters of a local model from the linguistic information of an accent phrase. The parameter based on tree structure which is obtained from syntactic analysis is a typical parameter for the global model. However, in such a global model, it is difficult to express syntactical context of phrases, and syntactical analysis is also a difficult problem. A global model is proposed which has integrated F0 shape generation and syntactic analysis. This model is presented as a network of those states which show syntactical and prosodic states of sentences. In the model a linguistic class of input accent phrase decides a state to move, and generates a phrasal accent parameter for a local model when taking the transition. The training method of this network is also proposed. The predicted results showed that this model can predict the phrasal accent parameters with satisfactorily high accuracy. It strongly suggests that high quality synthesized speech can be obtained with the model.

Full Text