Abstract

Facial emotional expression is a nonverbal communication medium in human-human communication. Facial expression recognition (FER) is a significantly challenging task in computer vision. With the advent of deep neural networks, facial expression recognition has transitioned from lab-controlled settings to more neutral environments. However, deep neural networks (DNNs) suffer from overfitting the data and biases towards specific categorical distribution. The number of samples in each category is heavily imbalanced, and overall the number of samples is much less than the full number of samples representing all emotions. In this paper, we propose an end-to-end convolutional-self attention framework for classifying facial emotions. The convolutional neural network (CNN) layers can capture the spatial features in a given frame. Here we apply a convolutional-self-attention mechanism to obtain the spatiotemporal features and perform context modelling. The AffectNet database is used to validate the framework. The AffectNet database has a large number of image samples in the wild settings, which makes this database very challenging. The result shows a 30% improvement in accuracy from the CNN baseline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call