Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis

Chenpeng Du,Kai Yu

doi:10.1109/taslp.2021.3133205

Abstract

Generating natural speech with a diverse and smooth prosody pattern is a challenging task. Although random sampling with phone-level prosody distribution has been investigated to generate different prosody patterns, the diversity of the generated speech is still very limited and far from what can be achieved by humans. This is largely due to the use of uni-modal distribution, such as single Gaussian, in the prior works of phone-level prosody modelling. In this work, we propose a novel approach that models phone-level prosodies with a GMM-based mixture density network(MDN) and then extend it for multi-speaker TTS using speaker adaptation transforms of Gaussian means and variances. Furthermore, we show that we can clone the prosodies from a reference speech by sampling prosodies from the Gaussian components that produce the reference prosodies. Our experiments on LJSpeech and LibriTTS dataset show that the proposed method with GMM-based MDN not only achieves significantly better diversity than using a single Gaussian in both single-speaker and multi-speaker TTS, but also provides better naturalness. The prosody cloning experiments demonstrate that the prosody similarity of the proposed method with GMM-based MDN is comparable to recent proposed fine-grained VAE while the target speaker similarity is better.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2022
Citations: 8

Similar Papers

Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network
Chenpeng Du ... Kai Yu
-
Chenpeng Du, et. al.Chenpeng Du ... Kai Yu
30 Aug 2021
30 Aug 2021

Voice conversion based on a mixture density network
Mohsen Ahangar ... Sudhendu Sharma
-
Mohsen Ahangar, et. al.Mohsen Ahangar ... Sudhendu Sharma
01 Oct 2017
01 Oct 2017

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Kenta Udagawa ... Yuki Saito
-
Kenta Udagawa, et. al.Kenta Udagawa ... Yuki Saito
18 Sep 2022
18 Sep 2022

Deep mixture density network for statistical model-based feature enhancement
Keisuke Kinoshita ... Takuya Higuchi
-
Keisuke Kinoshita, et. al.Keisuke Kinoshita ... Takuya Higuchi
01 Mar 2017
01 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing