Emotion-aware video applications (e.g., gaming, online meetings, online tutoring) strive to moderate the content presentations for a more engaging and improved user experience. These services typically deploy a machine-learning model that continuously infers the user's emotion (based on different physiological signals, facial expressions, etc.) to adapt the content delivery. Therefore, to train such models, the emotion ground truth labels also need to be collected continuously. Typically, those are collated as emotion self-reports from users in a continuous manner (using an auxiliary device such as a joystick) when they watch some videos. This process of continuous emotion annotation not only increases the cognitive load and survey fatigue but also significantly deteriorates the viewing experience. To address this problem, we propose a framework, PResUP that probes a user for emotion self-reports opportunistically based on the physiological response variations of the user. Specifically, the framework implements a sequence of phases - (a) user profile construction based on physiological responses, (b) similar user clustering based on the user profile, and (c) training a parameterized activation-guided LSTM (Long short-term memory) model by sharing data among similar users to detect the opportune self-report collection moments. All these steps together help to reduce the continuous emotion annotation overhead by probing at the opportune moments without compromising the annotation quality. We evaluated PResUP by conducting a user study (N=36) during which the participants watched eight videos, and their physiological responses and continuous emotion self-reports were recorded. The key results from this evaluation reveal that PResUP reduces the annotation overhead by reducing the probing rate by an average of 34.80% and detects the opportune probing moments with an average TPR of 80.07% without compromising the annotation quality. Motivated by these findings, we deployed PResUP by performing a follow-up user study (N=18). In this deployment scenario, we also obtained similar performance in terms of the probing rate reduction (average reduction of 38.05%), and opportune moment detection performance (average TPR of 82.26%). These findings underscore the utility of PResUP in reducing the continuous emotion annotation effort during video consumption.
Read full abstract