High cognitive load arises from complex time and safety-critical tasks, for example, mapping out flight paths, monitoring traffic, or even managing nuclear reactors, causing stress, errors, and lowered performance. Over the last five years, our research has focused on using the multimodal interaction paradigm to detect fluctuations in cognitive load in user behavior during system interaction. Cognitive load variations have been found to impact interactive behavior: by monitoring variations in specific modal input features executed in tasks of varying complexity, we gain an understanding of the communicative changes that occur when cognitive load is high. So far, we have identified specific changes in: speech, namely acoustic, prosodic, and linguistic changes; interactive gesture; and digital pen input, both interactive and freeform. As ground-truth measurements, galvanic skin response, subjective, and performance ratings have been used to verify task complexity. The data suggest that it is feasible to use features extracted from behavioral changes in multiple modal inputs as indices of cognitive load. The speech-based indicators of load, based on data collected from user studies in a variety of domains, have shown considerable promise. Scenarios include single-user and team-based tasks; think-aloud and interactive speech; and single-word, reading, and conversational speech, among others. Pen-based cognitive load indices have also been tested with some success, specifically with pen-gesture, handwriting, and freeform pen input, including diagraming. After examining some of the properties of these measurements, we present a multimodal fusion model, which is illustrated with quantitative examples from a case study. The feasibility of employing user input and behavior patterns as indices of cognitive load is supported by experimental evidence. Moreover, symptomatic cues of cognitive load derived from user behavior such as acoustic speech signals, transcribed text, digital pen trajectories of handwriting, and shapes pen, can be supported by well-established theoretical frameworks, including O'Donnell and Eggemeier's workload measurement [1986] Sweller's Cognitive Load Theory [Chandler and Sweller 1991], and Baddeley's model of modal working memory [1992] as well as McKinstry et al.'s [2008] and Rosenbaum's [2005] action dynamics work. The benefit of using this approach to determine the user's cognitive load in real time is that the data can be collected implicitly that is, during day-to-day use of intelligent interactive systems, thus overcomes problems of intrusiveness and increases applicability in real-world environments, while adapting information selection and presentation in a dynamic computer interface with reference to load.
Read full abstract