Abstract
This paper proposes a generative approach to construct high-quality speech synthesis from noisy speech. Studio-quality recorded speech is required to construct high-quality speech synthesis, but most of existing speech has been recorded in a noisy environment. A common method to use noisy speech for training speech synthesis models is reducing the noise before the vocoder-based parameterization. However, such multi-step processes cause an accumulation of spectral distortion. Meanwhile, statistical parametric speech synthesis (SPSS) without vocoders, which directly generates spectral parameters or waveforms, has been proposed recently. The vocoder-free SPSS will enable us to train speech synthesis models considering the noise addition process generally used in signal processing research. In the proposed approach, newly introduced noise generation models trained by a generative adversarial training algorithm randomly generates spectra of the noise. The speech synthesis models are trained to make the sum of their output and the randomly generated noise close to spectra of noisy speech. Because the noise generation model parameters fit the spectrum of the observed noise, the proposed method can alleviate the spectral distortion found in the conventional method. Experimental results demonstrate that the proposed method outperforms the conventional method in terms of synthetic speech quality.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.