Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

Chi-Chun Hsia Chi-Chun Hsia,Jung-Yun Wu Jung-Yun Wu,Chung-Hsien Wu Chung-Hsien Wu

doi:10.1109/tasl.2010.2040791

Abstract

This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Nov 1, 2010
Citations: 67

Similar Papers

Modeling of Speech Parameter Sequence Considering Global Variance for HMM-Based Speech Synthesis
Tomoki Toda
-
Tomoki TodaTomoki Toda
19 Apr 2011
19 Apr 2011

An HMM-Based Approach to Flexible Speech Synthesis
Keiichi Tokuda
-
Keiichi TokudaKeiichi Tokuda
01 Jan 2006
01 Jan 2006

Multi-speaker modeling with shared prior distributions and model structures for Bayesian speech synthesis
Kei Hashimoto ... Yoshihiko Nankaku
-
Kei Hashimoto, et. al.Kei Hashimoto ... Yoshihiko Nankaku
27 Aug 2011
27 Aug 2011

Factored Maximum Penalized Likelihood Kernel Regression for HMM-Based Style-Adaptive Speech Synthesis
June Sig Sung ... Doo Hwa Hong
IEEE Journal of Selected Topics in Signal Processing | VOL. 8
June Sig Sung, et. al.June Sig Sung ... Doo Hwa Hong
01 Apr 2014
IEEE Journal of Selected Topics in Signal Processing | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing