Abstract

Providing feedback to a speaker is an essential communication signal for maintaining a conversation. In specific feedback, which indicates the listener's reaction to the speaker?s utterances, the facial expression is an effective modality for conveying the listener's reactions. Moreover, not only the type of facial expressions, but also the degree of intensity of the expressions, may influence the meaning of the specific feedback. In this study, we propose a multimodal deep neural network model that predicts the intensity of facial expressions co-occurring with feedback responses. We focus on multiparty video-mediated communication. In video-mediated communication, close-up frontal face images of each participant are continuously presented on the display; the attention of the participants is more likely to be drawn to the facial expressions. We assume that in such communication, the importance of facial expression in the listeners? feedback responses increases. We collected 33 video-mediated conversations by groups of three people and obtained audio and speech data for each participant. Using the corpus collected as a dataset, we created a deep neural network model that predicts the intensity of 17 types of action units (AUs) co-occurring with the feedback responses. The proposed method employed GRU-based model with attention mechanism for audio, visual, and language modalities. A decoder was trained to produce the intensity values for the 17 AUs frame by frame. In the experiment, unimodal and multimodal models were compared in terms of their performance in predicting salient AUs that characterize facial expression in feedback responses. The results suggest that well-performing models differ depending on the AU categories; audio information was useful for predicting AUs that express happiness, and visual and language information contributes to predicting AUs expressing sadness and disgust.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.