A speaking rate-controlled Mandarin TTS system

Chiao-Hua Hsieh,Sin-Horng Chen,Yih-Ru Wang,Chen-Yu Chiang

doi:10.1109/icassp.2013.6638999

Abstract

In this paper, a new speaking rate-controlled Mandarin TTS system based on a speaking rate-dependent hierarchical prosodic model (SR-HPM) [6] is proposed. In the training phase, a data-driven approach is employed to automatically build the SR-HPM directly from a large prosody-unlabeled speech database containing utterances of various speaking rates. The SR-HPM comprises 15 sub-models designed to describe various relationships among 3 types of prosodic-acoustic features of speech utterances, two types of prosodic tags specifying a 4-layer prosody hierarchy, linguistic features of various levels of the associated texts, and the speaking rates. In the test phase, the SR-HPM is employed to generate 4 prosodic-acoustic features, including syllable pitch contours, syllable durations, syllable energy levels, and syllable juncture pause durations. Combining these prosodic features with the spectral features generated by the HTS synthesizer, the system can generate natural speech for any speaking rate in a wide range of 0.15-0.3 seconds/syllable. A distinct feature of the system to control the occurrence frequencies of breaks of various types as well as their pause durations according to the given speaking rate was demonstrated. A subjective test showed that MOS scores of 3.35, 3.44 and 3.28 were achieved respectively for fast (SR=0.17 sec/syllable), medium (SR=0.2 sec/syllable) and slow (SR=0.25 sec/syllable) synthetic speeches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A speaking rate-controlled Mandarin TTS system

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
Chen-Yu Chiang
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2018
Chen-Yu ChiangChen-Yu Chiang
11 Jul 2018
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2018

ARTICULATION AND SPEAKING RATES IN BILINGUALS WITH REGARD TO TIME OF EXPOSURE TO ONE LANGUAGE
Mirosław Michalik ... Anna Solak
Acta Neuropsychologica | VOL. 18
Mirosław Michalik, et. al.Mirosław Michalik ... Anna Solak
15 May 2020
Acta Neuropsychologica | VOL. 18

Durational Patterning at Syntactic and Discourse Boundaries in Mandarin Spontaneous Speech
Janice Fon ... Sally Chen
Language and Speech | VOL. 54
Janice Fon, et. al.Janice Fon ... Sally Chen
01 Sep 2010
Language and Speech | VOL. 54

Analysis and modeling of syllable duration for Thai speech synthesis
Chatchawarn Hansakunbuntheung ... Virongrong Tesprasit
-
Chatchawarn Hansakunbuntheung, et. al.Chatchawarn Hansakunbuntheung ... Virongrong Tesprasit
01 Sep 2003
01 Sep 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A speaking rate-controlled Mandarin TTS system

Abstract

Talk to us

Similar Papers