Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

Heiga Zen,Andrew Senior

doi:10.1109/icassp.2014.6854321

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

Heiga Zen, Andrew Senior

Open Access

https://doi.org/10.1109/icassp.2014.6854321

Copy DOI

Publication Date: May 1, 2014

Citations: 219

Affiliation: Concordia University

#Statistical Parametric Speech Synthesis #Modeling For Speech Synthesis + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturally-sounding synthesized speech. However, there are limitations in the current implementation of DNN-based acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, this paper investigates the use of a mixture density output layer. It can estimate full probability density functions over real-valued output features conditioned on the corresponding input features. Experimental results in objective and subjective evaluations show that the use of the mixture density output layer improves the prediction accuracy of acoustic features and the naturalness of the synthesized speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.