A multimodal approach for modeling engagement in conversation

Arthur Pellet-Rostaing,Philippe Blache,Auriane Boudin,Roxane Bertrand,Stéphane Rauzy

doi:10.3389/fcomp.2023.1062342

Abstract

Recently, engagement has emerged as a key variable explaining the success of conversation. In the perspective of human-machine interaction, an automatic assessment of engagement becomes crucial to better understand the dynamics of an interaction and to design socially-aware robots. This paper presents a predictive model of the level of engagement in conversations. It shows in particular the interest of using a rich multimodal set of features, outperforming the existing models in this domain. In terms of methodology, study is based on two audio-visual corpora of naturalistic face-to-face interactions. These resources have been enriched with various annotations of verbal and nonverbal behaviors, such as smiles, head nods, and feedbacks. In addition, we manually annotated gestures intensity. Based on a review of previous works in psychology and human-machine interaction, we propose a new definition of the notion of engagement, adequate for the description of this phenomenon both in natural and mediated environments. This definition have been implemented in our annotation scheme. In our work, engagement is studied at the turn level, known to be crucial for the organization of the conversation. Even though there is still a lack of consensus around their precise definition, we have developed a turn detection tool. A multimodal characterization of engagement is performed using a multi-level classification of turns. We claim a set of multimodal cues, involving prosodic, mimo-gestural and morpho-syntactic information, is relevant to characterize the level of engagement of speakers in conversation. Our results significantly outperform the baseline and reach state-of-the-art level (0.76 weighted F-score). The most contributing modalities are identified by testing the performance of a two-layer perceptron when trained on unimodal feature sets and on combinations of two to four modalities. These results support our claim about multimodality: combining features related to the speech fundamental frequency and energy with mimo-gestural features leads to the best performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Computer Science	Publication Date: Mar 2, 2023
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A multimodal approach for modeling engagement in conversation

Abstract

Talk to us

Similar Papers

More From: Frontiers in Computer Science

Lead the way for us

Similar Papers

Estimation of Dementia Severity Using SVM based on Patient's Engagement Levels in Conversation
Ryuji Yamazaki ... Shuichi Nishio
-
Ryuji Yamazaki, et. al.Ryuji Yamazaki ... Shuichi Nishio
04 Mar 2021
04 Mar 2021

Detecting user engagement in everyday conversations
Chen Yu ... Paul Aoki
-
Chen Yu, et. al.Chen Yu ... Paul Aoki
04 Oct 2004
04 Oct 2004

Do medical students respond empathetically to a virtual patient?
Adeline M Deladisma ... D Scott Lind
The American Journal of Surgery | VOL. 193
Adeline M Deladisma, et. al.Adeline M Deladisma ... D Scott Lind
17 May 2007
The American Journal of Surgery | VOL. 193

The Effects of Healthcare Robot Empathy Statements and Head Nodding on Trust and Satisfaction: A Video Study
Deborah L Johanson ... Kazuki Saegusa
ACM Transactions on Human-Robot Interaction | VOL. 12
Deborah L Johanson, et. al.Deborah L Johanson ... Kazuki Saegusa
15 Feb 2023
ACM Transactions on Human-Robot Interaction | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A multimodal approach for modeling engagement in conversation

Abstract

Talk to us

Similar Papers

More From: Frontiers in Computer Science