Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech

Masakazu Une,Shinnosuke Takamichi,Yuki Saito,Ryoichi Miyazaki,Daichi Kitamura,Hiroshi Saruwatari

doi:10.23919/apsipa.2018.8659691

Abstract

This paper proposes a generative approach to construct high-quality speech synthesis from noisy speech. Studio-quality recorded speech is required to construct high-quality speech synthesis, but most of existing speech has been recorded in a noisy environment. A common method to use noisy speech for training speech synthesis models is reducing the noise before the vocoder-based parameterization. However, such multi-step processes cause an accumulation of spectral distortion. Meanwhile, statistical parametric speech synthesis (SPSS) without vocoders, which directly generates spectral parameters or waveforms, has been proposed recently. The vocoder-free SPSS will enable us to train speech synthesis models considering the noise addition process generally used in signal processing research. In the proposed approach, newly introduced noise generation models trained by a generative adversarial training algorithm randomly generates spectra of the noise. The speech synthesis models are trained to make the sum of their output and the randomly generated noise close to spectra of noisy speech. Because the noise generation model parameters fit the spectrum of the observed noise, the proposed method can alleviate the spectral distortion found in the conventional method. Experimental results demonstrate that the proposed method outperforms the conventional method in terms of synthetic speech quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Soft context clustering for F0 modeling in HMM-based speech synthesis
Soheil Khorram ... Simon King
EURASIP Journal on Advances in Signal Processing | VOL. 2015
Soheil Khorram, et. al.Soheil Khorram ... Simon King
09 Jan 2015
EURASIP Journal on Advances in Signal Processing | VOL. 2015

UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis
Xiao Zhou ... Zhen-Hua Ling
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29
Xiao Zhou, et. al.Xiao Zhou ... Zhen-Hua Ling
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29

Multi-speaker modeling with shared prior distributions and model structures for Bayesian speech synthesis
Kei Hashimoto ... Yoshihiko Nankaku
-
Kei Hashimoto, et. al.Kei Hashimoto ... Yoshihiko Nankaku
27 Aug 2011
27 Aug 2011

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
Heiga Zen ... Andrew Senior
-
Heiga Zen, et. al.Heiga Zen ... Andrew Senior
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech

Abstract

Talk to us

Similar Papers