Patch-Range Attention and Visual Transformer for Facial Expression Recognition

Zhiwei Lu,Xuesong Tang

doi:10.1109/eiect58010.2022.00044

Abstract

Facial Expression Recognition (FER) task has attracted a lot of interests, particularly in human-computer interaction file. Many existing methods based on Convolutional Neural Network have been frequently used for the task of FER. However, convolutional filters in CNN rely heavily on the spatial locality, and are unable to learn the global features of facial expression at the beginning of the model. With the stacking of convolutional layers, the deep convolutional layer can extract global features, which however are incomplete in most instances. Therefore, the capacity of the model based on CNN is still insufficient for FER. In order to solve this problem, our paper proposes a different FER model that is made up of Patch-Range-Attention (PRA), Vision Transformer (ViT) and Squeeze-Excitation for the FER task. Our proposed method is accessed on four public FER datasets. The results on these four FER databases strongly prove our PVS model outperforms state-of-the-art models by a large margin.

Full Text