Abstract

Audio-visual emotion expression by synthetic agents is widely employed in research, industrial, and commercial applications. However, the mechanism through which people judge the multimodal emotional display of these agents is not yet well understood. This study is an attempt to provide a better understanding of the interaction between video and audio channels through the use of a continuous dimensional evaluation framework of valence, activation, and dominance. The results indicate that the congruent audio-visual presentation contains information allowing users to differentiate between happy and angry emotional expressions to a greater degree than either of the two channels individually. Interestingly, however, sad and neutral emotions which exhibit a lesser degree of activation show more confusion when presented using both channels. Furthermore, when faced with a conflicting emotional presentation, users predominantly attended to the vocal channel. It is speculated that this is most likely due to the limited level of facial emotion expression inherent in the current animated face. The results also indicate that there is no clear integration of audio and visual channels in emotion perception as in speech perception indicated by the McGurk effect. The final judgments were biased toward the modality with stronger expression power.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call