Multimodal learning for facial expression recognition

Wei Zhang,Youmei Zhang,Lin Ma,Jingwei Guan,Shijie Gong

doi:10.1016/j.patcog.2015.04.012

Abstract

In this paper, multimodal learning for facial expression recognition (FER) is proposed. The multimodal learning method makes the first attempt to learn the joint representation by considering the texture and landmark modality of facial images, which are complementary with each other. In order to learn the representation of each modality and the correlation and interaction between different modalities, the structured regularization (SR) is employed to enforce and learn the modality-specific sparsity and density of each modality, respectively. By introducing SR, the comprehensiveness of the facial expression is fully taken into consideration, which can not only handle the subtle expression but also perform robustly to different input of facial images. With the proposed multimodal learning network, the joint representation learning from multimodal inputs will be more suitable for FER. Experimental results on the CK+ and NVIE databases demonstrate the superiority of our proposed method.

Full Text