Abstract

One of the advantages of statistical parametric speech synthesis is the ability to alter some of the characteristics of the speech e.g. change the speaker, expression etc. In this paper we present a technique to adapt an expressive single speaker deep neural network (DNN) speech synthesis model to a new speaker, allowing for both neutral and expressive speech in the new speaker's voice. Experiments show that the proposed adaptation technique achieves higher MOS scores on both neutral and expressive speech, and higher speaker similarity and slightly lower expression similarity scores on the expressive speech when compared with another DNN speaker adaptation technique.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.