Virtual reality is rapidly evolving into a pragmatically usable technology for mental health (MH) applications. As the underlying enabling technologies continue to evolve and allow us to design more useful and usable structural virtual environments (VEs), the next important challenge will involve populating these environments with virtual representations of humans (avatars). This will be vital to create mental health VEs that leverage the use of avatars for applications that require human-human interaction and communication. As Alessi et al.1 pointed out at the 8th Annual Medicine Meets Virtual Reality Conference (MMVR8), virtual humans have mainly appeared in MH applications to "serve the role of props, rather than humans." More believable avatars inhabiting VEs would open up possibilities for MH applications that address social interaction, communication, instruction, assessment, and rehabilitation issues. They could also serve to enhance realism that might in turn promote the experience of presence in VR. Additionally, it will soon be possible to use computer-generated avatars that serve to provide believable dynamic facial and bodily representations of individuals communicating from a distance in real time. This could support the delivery, in shared virtual environments, of more natural human interaction styles, similar to what is used in real life between people. These techniques could enhance communication and interaction by leveraging our natural sensing and perceiving capabilities and offer the potential to model human-computer-human interaction after human-human interaction. To enhance the authenticity of virtual human representations, advances in the rendering of facial and gestural behaviors that support implicit communication will be needed. In this regard, the current paper presents data from a study that compared human raters' judgments of emotional expression between actual video clips of facial expressions and identical expressions rendered on a three-dimensional avatar using a performance-driven facial animation (PDFA) system developed at the University of Southern California Integrated Media Systems Center. PDFA offers a means for creating high-fidelity visual representations of human faces and bodies. This effort explores the feasibility of sensing and reproducing a range of facial expressions with a PDFA system. In order to test concordance of human ratings of emotional expression between video and avatar facial delivery, we first had facial model subjects observe stimuli that were designed to elicit naturalistic facial expressions. The emotional stimulus induction involved presenting text-based, still image, and video clips to subjects that were previously rated to induce facial expressions for the six universals2 of facial expression (happy, sad, fear, anger, disgust, and surprise), in addition to attentiveness, puzzlement and frustration. Videotapes of these induced facial expressions that best represented prototypic examples of the above emotional states and three-dimensional avatar animations of the same facial expressions were randomly presented to 38 human raters. The raters used open-end, forced choice and seven-point Likert-type scales to rate expression in terms of identification. The forced choice and seven-point ratings provided the most usable data to determine video/animation concordance and these data are presented. To support a clear understanding of this data, a website has been set up that will allow readers to view the video and facial animation clips to illustrate the assets and limitations of these types of facial expression-rendering methods (www. USCAvatars.com/MMVR). This methodological first step in our research program has served to provide valuable human user-centered feedback to support the iterative design and development of facial avatar characteristics for expression of emotional communication.
Read full abstract