Abstract

In conventional HMM-based speech synthesis, the algorithm for generating a high-quality reading style (neutral) speech has been well investigated. However, the human-like expressive speech synthesis is still rather far from practicability, which is caused by many factors. One of the influential factors is that the speech variability caused by speaker's arousal is rarely emphasized in speech synthesis. Accordingly, this paper proposed a novel speech synthesis method considering the speech variability. Two major advantages are highlighted by considering the speech variability. The first advantage is that the proposed method is capable of generating the time-variant human-like and expressive speech. The second one is to increase the diversity of expressive speech and to improve the drawback of traditional speech synthesis system with the monotonous characteristics of speech. The experimental result shows that the proposed method can improve the diversity capability of synthetic speech and successfully achieve the more expressive speech compare to conventional HTS one.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call