Speaker-Adaptive Neural Vocoders for Parametric Speech Synthesis Systems

Eunwoo Song,Jin-Seob Kim,Hong-Goo Kang,Kyungguen Byun

doi:10.1109/mmsp48831.2020.9287168

Abstract

This paper proposes speaker-adaptive neural vocoders for parametric text-to-speech (TTS) systems. Recently proposed WaveNet-based neural vocoding systems successfully generate a time sequence of speech signal with an autoregressive framework. However, it remains a challenge to synthesize high-quality speech when the amount of a target speaker's training data is insufficient. To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models. In the proposed method, a speaker-independent training method is applied to capture universal attributes embedded in multiple speakers, and the trained model is then optimized to represent the specific characteristics of the target speaker. Experimental results verify that the proposed TTS systems with speaker-adaptive neural vocoders outperform those with traditional source-filter model-based vocoders and those with WaveNet vocoders, trained either speaker-dependently or speaker-independently. In particular, our TTS system achieves 3.80 and 3.77 MOS for the Korean male and Korean female speakers, respectively, even though we use only ten minutes' speech corpus for training the model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speaker-Adaptive Neural Vocoders for Parametric Speech Synthesis Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Zolzaya Byambadorj ... Ryota Nishimura
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021
Zolzaya Byambadorj, et. al.Zolzaya Byambadorj ... Ryota Nishimura
01 Dec 2021
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021

Speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS
Po-Chun Wang ... Chen-Yu Chiang
-
Po-Chun Wang, et. al.Po-Chun Wang ... Chen-Yu Chiang
01 Sep 2014
01 Sep 2014

Objective measures to improve the selection of training speakers in HMM-based child speech synthesis
Avashna Govender ... Febe De Wet
-
Avashna Govender, et. al.Avashna Govender ... Febe De Wet
01 Nov 2016
01 Nov 2016

On the use of spectral transformation for speaker adaptation in HMM based isolated-word speech recognition
H.C Choi ... R.W King
Speech Communication | VOL. 17
H.C Choi, et. al.H.C Choi ... R.W King
01 Aug 1995
Speech Communication | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker-Adaptive Neural Vocoders for Parametric Speech Synthesis Systems

Abstract

Talk to us

Similar Papers