Abstract

This paper describes a new approach which utilizes neural autoregressive distribution estimators (NADE) for the spectral modeling in statistical parametric speech synthesis. In order to alleviate the over-smoothing effect on the generated spectral structures, a restricted Boltzmann machine (RBM) modeling method has been proposed in our previous work, where the RBM is adopted to represent the joint distribution of high-dimensional and physically meaningful spectral envelopes. However, the RBM can not provide a tractable partition function even in a moderate size. In this paper, we introduce NADE to model the distribution of mel-cepstra and spectral envelopes at each HMM state considering its simplicity in evaluating the probability of given observations. At the stage of synthesis, the spectral parameters derived from the mode of each context-dependent NADE are used to replace the Gaussian mean vector in the parameter generation process. Experimental results show that the NADE is able to model the distribution of the spectral features with better accuracy than the RBM model. Furthermore, our proposed method improves the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra significantly and outperforms the RBM-based spectral modeling.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.