Video-based facial expression recognition (VFER) is the fundamental feature of various computer vision applications. Visual features are the key factors for facial expression recognition. However, the gap between the visual features and the emotions is large. In order to bridge the gap, the proposed method utilises convolutional neural networks (CNNs) and histogram of oriented gradient (HOG) to obtain the more comprehensive feature for VFER. Firstly, it extracts shallow features from the video frame through a number of convolutional kernels in CNNs, which has the characteristics of displacement, scale and deformation invariance. Then, the HOG is employed to extract HOG features from CNN's shallow features, which are strongly correlated with facial expressions. Finally, the support vector machine (SVM) is employed to conduct the task of facial expression recognition. The extensive experiments on RML, CK+ and AFEW5.0 database show that this framework takes on the promising performance and outperforming the state of the arts.
Read full abstract