Abstract

Automatic recognition of facial expressions of emotions, and detection of facial action units (AUs), from videos depends critically on modeling of their dynamics. These dynamics are characterized by changes in temporal phases (onset-apex-offset) and intensity of emotion/AUs, the appearance of which vary considerably among subjects, making the recognition/detection task very challenging. While state-of-the-art Latent Conditional Random Fields (LCRF) allow one to efficiently encode these dynamics via modeling of structural information (e.g., temporal consistency and ordinal constraints), their latent states are restricted to either unordered (nominal) or fully ordered (ordinal). However, such an approach is often too restrictive since, for instance, in the case of AU detection, the sequences of an active AU may better be described using ordinal latent states (corresponding to the AU intensity levels), while the sequences of this AU not being active may better be described using unordered (nominal) latent states. To this end, we propose the Variable-state LCRF model that automatically selects the optimal latent states (nominal or ordinal) for each sequence from each target class. This unsupervised adaptation of the model to individual sequence or subject contexts opens the possibility for improved model fitting and, subsequently, enhanced predictive performance. Our experiments on four public expression databases (CK+, AFEW, MMI and GEMEP-FERA) show that the proposed model consistently outperforms the state-of-the-art methods for both facial expression recognition and action unit detection from image sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call