Abstract

We present our studies on the use of characteristics that describe the occurrences of DISfluence and nonverbal vocalization (DIS-NV) in spoken expressions for the recognition of emotions in 0"turn" to denote the continuous speech made by one speaker without interrupting the other speaker. Note that each speaker tower can contain one or more declarations, and consecutive speaker declarations may or may not belong to the same speaker tour. Here, our definition of speaker tower focuses on feeling and integrity in speech production, which differs from "tower" in the context of a tower system, which focuses on the transition between different speakers. We carried out experiments in the spontaneous dialogue database AVEC2012 to study the effectiveness of the proposed work. Our results show that our DIS-NV functions offer better performance than LLD or PMI functions in predicting all emotional dimensions. The DIS-NV characteristics are particularly predictive of the emotional dimension Waiting linked to the speaker's uncertainty and allow the best reported result to be obtained. The emotion recognition model using only the 5 DIS-NV functions achieved overall performance linked to the best reported result obtained by a multimodal emotion recognition model using thousands of audiovisual and lexical functionalities. These results confirmed that the proposed characteristics of DIS-NV are predictive of emotions in spontaneous dialogue.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call