Exploring the Design Space of Automatically Generated Emotive Captions for Deaf or Hard of Hearing Users

Saad Hassan,Yao Ding,Christi Miller,Brenden Gilbert,John Burnett,Agneya Abhimanyu Kerure,Emily Biondo

doi:10.1145/3544549.3585880

Abstract

Caption text conveys salient auditory information to deaf or hard-of-hearing (DHH) viewers. However, the emotional information within the speech is not captured. We developed three emotive captioning schemas that map the output of audio-based emotion detection models to expressive caption text that can convey underlying emotions. The three schemas used typographic changes to the text, color changes, or both. Next, we designed a Unity framework to implement these schemas and used it to generate stimuli videos. In an experimental evaluation with 28 DHH viewers, we compared DHH viewers’ ability to understand emotions and their subjective judgments across the three captioning schemas. We found no significant difference in participants’ ability to understand the emotion based on the captions or their subjective preference ratings. Open-ended feedback revealed factors contributing to individual differences in preferences among the participants and challenges with automatically generated emotive captions that motivate future work.

Full Text