Hierarchical stress generation with Fujisaki model in expressive speech synthesis

Jianhua Tao,Xiaoying Xu,Wei Lai,Keikichi Hirose,Ya Li

doi:10.21437/speechprosody.2014-196

Jianhua Tao, Xiaoying Xu + Show 3 more

Open Access

https://doi.org/10.21437/speechprosody.2014-196

Copy DOI

Abstract

This paper introduces a hierarchical stress generation for expressive speech synthesis. In the previous study, we proposed a novel hierarchical Mandarin stress modeling method, and the text-based stress prediction experiments demonstrates a reliable stress assignment can be obtained from textual features. However, the stress model should be further verified to be an effective and efficient prosody model in a Text-to-Speech system. In this work, Fujisaki model known as an ideal global representation of prosody is adopted to construct the pitch contours. To illustrate the effect of stress model, the Fujisaki model parameters are automatically predicted by the textural feature with and without stress information. The synthetic speech sounds more natural than that without stress modeling. The RMSE of the pitch contour and the feature importance analysis also show stress information can improve the pitch modeling. This work offers a promising method to accurate pitch modeling for Mandarin expressive speech synthesis.

Full Text