Abstract

A key problem for today's speech synthesis technology is to automatically generate an appropriate hierarchical prosodic structure for text input and incorporate it into synthesized speech. The paper presents a method for such a problem in Mandarin Chinese. This method uses a speech database for the training of a statistical model to generate the prosodic structure and determine prosodic parameters such as syllable duration, pause, energy and intonation. The experimental results show that an accuracy of 83.1% in the prediction of prosodic structure can be achieved. Furthermore, a Chinese text-to-speech system can be developed based on the proposed prosodic structure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call