Hitting your MARQ: Multimodal ARgument Quality Assessment in Long Debate Video

James Spann ,Md Saiful Islam ,Masum Hasan ,Ehsan Hoque ,Kurtis Haut ,Md Kamrul Hasan ,Rada Mihalcea

doi:10.48448/me71-jf58

Abstract

The combination of gestures, intonations, and textual content plays a key role in argument delivery. However, the current literature mostly considers textual content while assessing the quality of an argument, and is limited to datasets containing short sequences (18-48 words). In this paper, we study argument quality assessment in a multimodal context, and experiment on DBATES, a publicly available dataset of long debate videos. First, we propose a set of interpretable debate-centric features such as clarity, content variation, body movement cues, and pauses, inspired by theories of argumentation quality. Second, we design the Multimodal ARgument Quality assessor (MARQ) -- a hierarchical neural network model that summarizes the multimodal signals on long sequences and enriches the multimodal embedding with debate-centric features. Our proposed MARQ model achieves an accuracy of 81.91% on the argument quality prediction task and outperforms established baseline models with an error rate reduction of 22.7%. Through ablation studies, we demonstrate the importance of multimodal cues in modeling argument quality.

Full Text