Multimodal modeling of collaborative problem-solving facets in triads

Angela E B Stewart,Sidney K D’Mello,Zachary Keirn

doi:10.1007/s11257-021-09290-y

Abstract

Collaborative problem-solving (CPS) is ubiquitous in everyday life, including work, family, leisure activities, etc. With collaborations increasingly occurring remotely, next-generation collaborative interfaces could enhance CPS processes and outcomes with dynamic interventions or by generating feedback for after-action reviews. Automatic modeling of CPS processes (called facets here) is a precursor to this goal. Accordingly, we build automated detectors of three critical CPS facets—construction of shared knowledge, negotiation and coordination, and maintaining team function—derived from a validated CPS framework. We used data of 32 triads who collaborated via a commercial videoconferencing software, to solve challenging problems in a visual programming task. We generated transcripts of 11,163 utterances using automatic speech recognition, which were then coded by trained humans for evidence of the three CPS facets. We used both standard and deep sequential learning classifiers to model the human-coded facets from linguistic, task context, facial expressions, and acoustic–prosodic features in a team-independent fashion. We found that models relying on nonverbal signals yielded above-chance accuracies (area under the receiver operating characteristic curve, AUROC) ranging from .53 to .83, with increases in model accuracy when language information was included (AUROCS from .72 to .86). There were no advantages of deep sequential learning methods over standard classifiers. Overall, Random Forest classifiers using language and task context features performed best, achieving AUROC scores of .86, .78, and .79 for construction of shared knowledge, negotiation/coordination, and maintaining team function, respectively. We discuss application of our work to real-time systems that assess CPS and intervene to improve CPS outcomes.

Full Text