Abstract

The paper proposes a method for generating natural prosody for text to speech systems. In this method, sentences are composed by inlaying a variable word into each slot in prepared sentence structures. This method can be used for domain specific text to speech applications that don't require so many sentence structures but many words. Important parameters for prosody are declination of F0 contour, accent strength of word, and position and duration of pause. So we construct a database containing these parameters, manually extracted from natural speech samples that have the sentence structures to be synthesized. In the process of prosody generation, these parameters are retrieved by the type of the sentence structure, and the other parameters are generated with rules. The F0 contour is generated by superposing these components on baseline frequency. Mean opinion score of prosody naturalness for the speech synthesized by the proposed method is 3.6. This is 1.2 points better than that of the former method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.