Abstract

In recent years multilayer perceptrons (MLPs) with many hidden layers Deep Neural Network (DNN) has performed surprisingly well in many speech tasks, i.e. speech recognition, speaker verification, speech synthesis etc. Although in the context of F 0 modeling these techniques has not been exploited properly. In this paper, Deep Belief Network (DBN), a class of DNN family has been employed and applied to model the F 0 contour of synthesized speech which was generated by HMM-based speech synthesis system. The experiment was done on Bengali language. Several DBN-DNN architectures ranging from four to seven hidden layers and up to 200 hidden units per hidden layer was presented and evaluated. The results were compared against clustering tree techniques popularly found in statistical parametric speech synthesis. We show that from textual inputs DBN-DNN learns a high level structure which in turn improves F 0 contour in terms of objective and subjective tests.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.