Abstract

Facial expression recognition (FER) using a deep convolutional neural network (DCNN) is important and challenging. Although a substantial effort is made to increase FER accuracy through DCNN, previous studies are still not sufficiently generalisable for real-world applications. Traditional FER studies are mainly limited to controlled lab-posed frontal facial images, which lack the challenges of motion blur, head poses, occlusions, face deformations and lighting under uncontrolled conditions. In this work, we proposed a SqueezExpNet architecture that can take advantage of local and global facial information for a highly accurate FER system that can handle environmental variations. Our network was divided into two stages: a geometrical attention stage that possesses a SqueezeNet-like architecture to obtain local highlight information and a spatial texture stage comprising several squeezed and expanded layers to exploit high-level global features. In particular, we created a weighted mask of 3D face landmarks and used element-wise multiplication with a spatial feature in the first stage to draw attention to important local facial regions. Next, we input the face spatial image and its augmentations into the second stage of the network. Finally, like a classifier, a recurrent neural network was designed to collaborate the highlighted information from dual stages rather than simply using the SoftMax function, thereby aiding in overcoming the uncertainties. Experiments covering basic and compound FER tasks were performed using the three leading facial expression datasets. Our strategy outperformed the existing DCNN methods and achieved state-of-the-art results. The developed architecture, adopted research methodology and reported findings may find potential applications of real-time FER in surveillance, health and feedback systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call