Abstract

In this paper, we focus on the problem of 3D dynamic (4D) facial expression recognition. While traditional methods rely on building deformation models on high-resolution 3D meshes, our approach works directly on low-resolution RGB-D sequences; this feature allows us to apply our algorithm to videos retrieved by widespread and standard low-resolution RGB-D sensors, such as Kinect. After preprocessing both RGB and depth image sequences, sparse features are learned from spatio-temporal local cuboids. Conditional Random Fields classifier is then employed for training and classification. The proposed system is fully-automatic and achieves superior results on three low-resolution datasets built from the 4D facial expression recognition dataset – BU-4DFE. Extensive evaluations of our approach and comparisons with state-of-the-art methods are presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call