Abstract

A multi-component emotion model is proposed to describe the affective states comprehensively and provide more details about emotion for the application of expressive speech synthesis. Four types of components from different perspectives – cognitive appraisal, psychological feeling, physical response and utterance manner are involved. And the interactions among them are also considered, by which the four components constitute a multi-layered structure. Based on the describing model, a detecting method is proposed to extract the affective states from text, as it is the requisite first step for an automatic generation of expressive synthetic speech. The deep stacking network is adopted and integrated with the hypothetic producing process of the four components, by which the intermediate layers of the network become visible and explicable. In addition, the affective states at document level and paragraph level are regarded as contextual features to extend available information for the emotion detection at sentence level. The effectiveness of the proposed method is validated through experiments. At sentence level, a 0.59 F-value of the predictions of utterance manner is achieved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call