Facial expression recognition faces great challenges due to factors such as face similarity, image quality, and age variation. Although various existing end-to-end Convolutional Neural Network (CNN) architectures have achieved good classification results in facial expression recognition tasks, these network architectures share a common drawback that the convolutional kernel can only compute the correlation between elements of a localized region when extracting expression features from an image. This leads to difficulties for the network to explore the relationship between all the elements that make up a complete expression. In response to this issue, this article proposes a facial expression recognition network called HFE-Net. In order to capture the subtle changes of expression features and the whole facial expression information at the same time, HFE-Net proposed a Hybrid Feature Extraction Block. Specifically, Hybrid Feature Extraction Block consists of parallel Feature Fusion Device and Multi-head Self-attention. Among them, Feature Fusion Device not only extracts the local information in expression features, but also measures the correlation between distant elements in expression features, which helps the network to focus more on the target region while realizing the information interaction between distant features. And Multi-head Self-attention can calculate the correlation between the overall elements in the feature map, which helps the network to extract the overall information of the expression features. We conducted a lot of experiments on four publicly available facial expression datasets and verified that the Hybrid Feature Extraction Block constructed in this paper can improve the network's recognition ability for facial expressions.
Read full abstract