Abstract

One of the advantages of statistical parametric speech synthesis is the ability to alter some of the characteristics of the speech e.g. change the speaker, expression etc. In this paper we present a technique to adapt an expressive single speaker deep neural network (DNN) speech synthesis model to a new speaker, allowing for both neutral and expressive speech in the new speaker's voice. Experiments show that the proposed adaptation technique achieves higher MOS scores on both neutral and expressive speech, and higher speaker similarity and slightly lower expression similarity scores on the expressive speech when compared with another DNN speaker adaptation technique.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call