Abstract

Current state-of-the-art approaches to emotion recognition primarily focus on modeling the nonverbal expressions of the sole individual without reference to contextual elements like the co-presence of the partner. In this paper, we demonstrate that the accurate inference of listeners' social-emotional state of attention depends on also accounting for the nonverbal behaviors of their storytelling partner, namely their speaker cues. To gain a deeper understanding of the role of speaker cues in attention inference, we conduct investigations into real world interactions of children storytelling with their peers. Through in-depth analysis of human-human interaction data, we first identify nonverbal speaker cues (i.e., backchannel-inviting cues) and listener responses (i.e., backchannel feedback) to later demonstrate how speaker cues can modify the interpretation of attention-related backchannels as well as serve as a means to regulate the responsiveness of listeners. We then discuss the design implications of our findings toward our primary goal of developing attention recognition models for storytelling robots. Social robots can use speaker cues to form more accurate inferences about the attentive state of their human partner.

Highlights

  • Storytelling is an interaction form that is mutually regulated between storytellers and listeners where a key dynamic is the back-and-forth process of speaker cues and listener responses

  • Since compounded cue contexts have a higher likelihood of elici­ ting a response from listeners, robot storytellers can manipulate their production of nonverbal speaker cues to deliberately gain more information

  • Further experimental validation is necessary to confirm the effectiveness of robot-generated speaker cues to boost attention recognition accuracies when incorporated into the model and evaluated in a human–robot interaction context

Read more

Summary

INTRODUCTION

Storytelling is an interaction form that is mutually regulated between storytellers and listeners where a key dynamic is the back-and-forth process of speaker cues and listener responses. Called backchannel-inviting cues, are signaled nonverbally through changes in prosody, gaze patterns, and other behaviors They serve as a mechanism for storytellers to elicit feedback from listeners (Ward and Tsukahara, 2000). Through a finer-grain analysis, we find that the interpretation of backchannels from a listener depends on the storyteller’s cueing behaviors This cue-response pair is necessary for an accurate understanding of listener’s attention. A common approach in affective computing is to model only the expressions of the sole individual without reference to external context like the co-presence of a social agent In using these technologies for storytelling robots, we miss out on the added value their cueing actions can bring to the inference process. Using a logistic regression model, we find that backchannels are interpreted differently if observed after a weak, moderate, or strong cue. Section 5: General Discussion: We summarize our findings based on our human–human interaction studies and draw implications when modeling attention recognition for HRI

Context in Emotion Recognition— Humans vs Machines
Speaker Cues and Listener Responses—Children vs Adults
Overview
Method
ABSENT
Latency
Analysis of Inference Accuracy
Analysis of Inference Latency
Discussion
Data Collection
Storytelling Task
Video-Coded Annotations and Data Extraction
XXXXX joint joint
Analysis of Listener Behavior
Analysis of Speaker Cues
Cues N Rate p-value
Analysis of Cues and Responses to Predict State
Attention-Related Backchannels of Young Listeners
GENERAL DISCUSSION
Design Implication 1
Design Implication 2
Limitations
CONCLUSION
ETHICS STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.