Development and Application of Leader Identification Model Using Multimodal Information in Multi-Party Conversations

Tsukasa Shiota,Kouki Honda,Kazutaka Shimada,Takeshi Saitoh

doi:10.1142/s2717554520500198

Abstract

Predicting the roles of participants in conversations is a fundamental task to build a system that provides assessment results and feedback for each participant. Various role recognition models have been proposed. Nonetheless, most studies have only utilized verbal or nonverbal features even though people usually express what they think or feel with the combination of language, gestures, and tone of voice. In this paper, we aim to realize a high-performance role recognition model by combining features from various modalities. We design nonverbal features that can be extracted from video and audio data. Then, we construct a multimodal leader identification method that fuses nonverbal features proposed by us and verbal features proposed by a previous study. In our experiments, our multimodal model outperforms the baseline model that utilizes only verbal features. We also conduct some analysis, such as statistical tests and ablation studies, and verify the effectiveness of each modality and feature. In the end, we build a prototype of a feedback system and demonstrate how our study can be applied to the discussion assessment/feedback systems.

Full Text