Abstract

Emotion recognition based on facial expression is a challenging research topic and has attracted a great deal of attention in the past few years. This paper presents a novel method, utilizing multi-modal strategy to extract emotion features from facial expression images. The basic idea is to combine the low-level empirical feature and the high-level self-learning feature into a multi-modal feature. The 2-dimensional coordinate of facial key points are extracted as low-level empirical feature and the high-level self-learning feature are extracted by the Convolutional Neural Networks (CNNs). To reduce the number of free parameters of CNNs, small filters are utilized for all convolutional layers. Owing to multiple small filters are equivalent of a large filter, which can reduce the number of parameters to learn effectively. And label-preserving transformation is used to enlarge the dataset artificially, in order to address the over-fitting and data imbalance of deep neural networks. Then, two kinds of modal features are fused linearly to form the facial expression feature. Extensive experiments are evaluated on the extended Cohn–Kanade (CK+) Dataset. For comparison, three kinds of feature vectors are adopted: low-level facial key point feature vector, high-level self-learning feature vector and multi-modal feature vector. The experiment results show that the multi-modal strategy can achieve encouraging recognition results compared to the single modal strategy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call