Abstract

Facial expression recognition (FER) is crucial for social communication. However, current studies present limitations when addressing facial expression difference due to demographic variation, such as race, gender, and age, etc. In this article, we first propose a deeply-supervised attention network (DSAN) to recognize human emotions based on facial images automatically. Based on DSAN, a two-stage training scheme is designed, taking full advantage of the race/gender/age-related information. In our DSAN framework, multi-scale features are leveraged to capture more discriminative information from the deep layers to the shallow layers. Furthermore, we adopt the attention block to highlight the essential local facial characteristics; it performs well when it is incorporated into the deeply-supervised framework. Finally, we combine the complementary characteristics of multiple convolutional layers in deeply-supervised manner and ensemble the intermediate predicted scores. Our experimental results have shown that our proposed framework can (i) effectively integrate demographic information in improving the performance of a variety of FER tasks, (ii) learn informative feature representations with a visual explanation by capturing the regions of interests (ROI), (iii) achieve superior performance for both the posed and the spontaneous FER databases, each containing pictures of human facial expressions varied in gender, age or race.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.