Abstract

Recognizing facial emotion is important in human communication, especially non-verbal communication. Despite the recent advancements in deep learning, facial emotion recognition has not achieved high performance compared to other classification tasks. Motivated by the mechanism of human visual perception, in which humans recognize the facial emotion by combining the informative facial regions (i.e., eyebrows, eyes, nose, and mouth) with different weights, we propose a novel facial emotion recognition network. To effectively train the informative facial regions, we introduce adaptive patch extraction and region adaptive self-attention schemes. The adaptive patch extraction initially decides the informative facial region based on the human facial perception. Then, based on the decided informative facial regions, attention weights between regions are estimated from the region adaptive self-attention scheme. Finally, by combining the features of facial regions with attention weights, the proposed network accurately recognizes facial emotion. The experimental results show that the proposed network effectively focuses on the informative region of the human face. Furthermore, through the comparison of facial emotion recognition accuracy, it is verified that the proposed network remarkably outperforms the state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call