Visual and multimodal analysis of human spontaneous behaviour: Introduction to the Special Issue

Maja Pantic

doi:10.1016/j.imavis.2009.07.001

Abstract

Widely anticipated in HCI is that computing will move to the background, weaving itself into the fabric of our everyday living and projecting the human user into the foreground. To realize this goal, next-generation computing (a.k.a. pervasive computing, ambient intelligence, and human computing) will need to develop human-centered user interfaces that respond readily to naturally occurring, multimodal, human communication. These interfaces will need the capacity to perceive, understand, and respond appropriately to human intentions and cognitive-emotional states as communicated by social and affective signals. Motivated by this vision of the future, automated analysis of nonverbal behavior has attracted increasing attention in diverse disciplines, including psychology, computer science, linguistics, and neuroscience. Promising approaches have been reported, especially in the areas of facial expression and multimodal communication. Until recently, much of this work has focused on posed, often exaggerated expressions (for reviews, see (Pantic & Rothkrantz, 2003; Tian et al., 2005; Zeng et al., 2009)). Yet, increasing evidence suggests that deliberate or posed behaviour differs in appearance and timing from that which occurs in daily life. For example, brow raises have larger amplitude, faster onset, and shorter duration when posed than when not (Schmidt et al., 2009). The morphology of facial actions differs as well. As (Littlewort, et al., 2009) in this issue report, facial actions systematically differ between spontaneous and feigned pain. Approaches to automatic behaviour analysis that have been trained on deliberate and typically exaggerated behaviours may fail to generalize to the complexity of expressive behaviour found in real-world settings. This Special Issue of Image and Vision Computing brings together cutting edge work on the automatic analysis of non-posed, real-world human behavior. It includes state-of-the-art reviews of computational approaches to conversation analysis (Gatica-Perez, 2009) and social signal processing (Vinciarelli et al., 2009), recent advances in generic face modeling (Lucey et al., 2009), automated detection of pain from facial behavior (Littlewort et al., 2009) (Ashraf et al., 2009), cognitive states of interest from facial, vocal, and gestural behavior (Schuller et al., 2009), automatic detection of diverse human activities from spatio-temporal features (Oikonomopoulos et al., 2009), and automatic recognition of American Sign Language (Ding & Martinez, 2009). These papers represent an exciting advance toward human-centered interfaces that can perceive and understand real-world human behavior. Of course, as discussed by Zeng et al. (2009) and by Vinciarelli et al. (2009) and Gatica-Perez (2009) in this issue, significant scientific and technical challenges remain to be addressed. However, we are optimistic about the continued progress. A principal reason is that automatic multimodal analysis of human naturalistic behavior is prerequisite in achieving next-generation, human-centered computing (Jaimes et al., 2006; Pantic et al., 2006, 2009); and this topic is poised to become one of the most active research topics in the computer vision and signal processing communities. To support these efforts, infrastructure is emerging from the extensive efforts of investigators, international sponsors, and professional societies. A sampling of research activities includes basic research on machine analysis of human behavior (e.g., European Research Council (ERC) MAHNOB project), automatic analysis of face-to-face and small group interactions (e.g., see the projects of MIT Human Dynamics Laboratory and European Commission (EC) FP6 AMIDA project), social signaling (e.g., EC FP7 Social Signal Processing NoE project), human-computer interactions (e.g., EC FP7 Semaine project), applications in mental health (Cohn et al., 2009), and other areas. The contributions in this Special Issue highlight recent advances and point to continued progress toward the goal of human-centered interfaces that can understand human intentions and behavior and respond intelligently.

Full Text