Abstract

Text-to-speech synthesis systems are expected to produce speech that is intelligible and natural. While conventional systems are capable of producing highly intelligible speech, naturalness needs to be improved, in the sense that regardless of the context, any given text is synthesized in a neutral tone. A number of existing techniques to synthesize emotional speech are data driven. However, collecting a large amount of emotional data is tedious. Therefore, signal processing algorithms can be used to modify neutral speech. The current work concentrates on incorporating happiness into neutral speech. Analysis reveals that happiness in speech primarily affects the pitch contour and the intensity of speech, and variations in these features are predominantly observed only in the emotive-keywords. Therefore, in the current work neutral speech is transformed to happy speech, by using signal processing algorithms to modify the pitch and intensity of the emotive-keywords. The happy speech synthesized by the proposed method, when assessed subjectively, yields a mean opinion score of 2.53 out of a possible 3. The synthetic speech is also assessed objectively using a GMM-based emotion recognition system, and all the tested sentences are recognized to be happy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.