Abstract

We investigated the mouth-opening transition pattern (MOTP), which represents the change of mouth-opening degree during the end of an utterance, and used it to predict the next speaker and utterance interval between the start time of the next speaker’s utterance and the end time of the current speaker’s utterance in a multi-party conversation. We first collected verbal and nonverbal data that include speech and the degree of mouth opening (closed, narrow-open, wide-open) of participants that were manually annotated in four-person conversation. A key finding of the MOTP analysis is that the current speaker often keeps her mouth narrow-open during turn-keeping and starts to close it after opening it narrowly or continues to open it widely during turn-changing. The next speaker often starts to open her mouth narrowly after closing it during turn-changing. Moreover, when the current speaker starts to close her mouth after opening it narrowly in turn-keeping, the utterance interval tends to be short. In contrast, when the current speaker and the listeners open their mouths narrowly after opening them narrowly and then widely, the utterance interval tends to be long. On the basis of these results, we implemented prediction models of the next-speaker and utterance interval using MOTPs. As a multimodal-feature fusion, we also implemented models using eye-gaze behavior, which is one of the most useful items of information for prediction of next-speaker and utterance interval according to our previous study, in addition to MOTPs. The evaluation result of the models suggests that the MOTPs of the current speaker and listeners are effective for predicting the next speaker and utterance interval in multi-party conversation. Our multimodal-feature fusion model using MOTPs and eye-gaze behavior is more useful for predicting the next speaker and utterance interval than using only one or the other.

Highlights

  • People start to have face-to-face conversations with others immediately after they are born.Face-to-face communication is one of the most important activities when people build social relationships with others

  • The F-measure of Multimodal-feature model (MuM), which uses both mouth-opening transition pattern (MOTP) and gaze transition patterns (GTPs), was 0.800, which is significantly better than those of All-mouth model (AmM) and Eye-gaze model (EgM) (MuM vs. AmM: t(9) = 3.24, p < 0.01; MuM vs. EgM: t(9) = 1.94, p < 0.05). These results indicate that multimodal-feature fusion using both MOTPs and GTPs is more useful for predicting the turn-keeping and turn-changing than using either the MOTPs or the GTPs individually

  • We demonstrated that the current speaker’s and listeners’ MOTPs differ depending on the speaker and utterance interval in multi-party conversations

Read more

Summary

Introduction

People start to have face-to-face conversations with others immediately after they are born.Face-to-face communication is one of the most important activities when people build social relationships with others. Multi-party face-to-face conversations involving multiple persons (three or more) are very important for information transmission or sharing and understanding other people’s intentions or emotions and for group decision making. If a computer can understand and predict how such multifaceted conversations are carried out smoothly, it should be possible to develop a system that supports smooth communication and can dialogue with the Multimodal Technologies and Interact. In a multi-party conversation, there are multiple listeners—in other words, multiple candidates for the speaker—which makes it difficult to perform turn-changing. Participants cognitively predict the appropriate timing of turn changes on the basis of verbal and nonverbal cues, as well as on the appropriateness of the speaker in their multi-party conversation and their proper utterance start timing. If a computational model can predict the speaker and the utterance interval between the start time of the speaker’s utterance and the end time of the current speaker’s utterance, the model will be an indispensable technology for facilitating conversations between humans and between humans and conversation agents or robots

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.