Data Augmentation and Second-Order Pooling for Facial Expression Recognition

Xiaoyun Tong,Songlin Sun,Meixia Fu

doi:10.1109/access.2019.2923530

Abstract

Facial expression is the main medium of information transmission in human communication, playing an important role in human’s daily life. Facial expression recognition is still challenging due to the various obstacle, illumination, and posture. However, most of the existing works focus on deeper or wider network structures and rarely explores the high-level feature statistics. In this paper, we propose a second-order pooling convolution neural network to explore the correlation information between the facial features after deep network learning. At the final stage of the network, we add a new covariance pooling layer to replace the first-order pooling of standard convolution networks. In the pooling layer of covariance, the Newton iteration method is used to approximate the square root instead of EIG or SVD, which makes it more suitable for GPU. Due to the small amount of facial expression data, this paper uses different data augmentation methods to increase the amount of training data and improve the generalization ability of the model. The proposed method, data augmentation and second-order pooling (DASOP), was evaluated on the real-world affective faces database (RAFDB) and the static facial expressions in the wild (SFEW), yielding correct rates of 88.625% and 59.518%, respectively. We achieve state-of-the-art performance superior to existing methods.

Highlights

Facial expression is the most natural and direct way to reflect people’s inner emotions and thoughts
This paper proposes data augmentation and second-order pooling based deep convolutional neural networks for facial expression recognition
We evaluate five current classical CNN architecture as backbone network for data augmentation and second-order pooling (DASOP) in Table 1. we present the comparison of accuracies with various standard network architectures on real-world affective faces database (RAFDB) and static facial expressions in the wild (SFEW)

Summary

INTRODUCTION

Facial expression is the most natural and direct way to reflect people’s inner emotions and thoughts. Amount of public database of facial expression recognition is limited, so data augmentation can be used to increase the diversity of training samples, improve the robustness and generalization ability of the model, and avoid over-fitting [3]. Deep facial expression recognition faces two key problems: (1) over-fitting due to lack of sufficient training data; (2) facial expression changes are subtle and changeable, the first-order information is insufficient to provide more discriminant information. To overcome these issues, this paper proposes data augmentation and second-order pooling based deep convolutional neural networks for facial expression recognition.

RELATED WORK

COVARIANCE POOLING

EXPERIMENTS

RESULTS AND DISCUSSION

Findings

CONCLUSION