Abstract

Recognizing facial expressions rely on facial parts’ movement (action units) such as eyes, mouth, and nose. Existing methods utilize complex subnetworks to learn part-based facial features or train neural networks with an extensively perturbed dataset. Different from existing methods, we propose a trainable end-to-end convolutional neural network for facial expression recognition. First, we propose a Local Prediction Penalty to stimulate facial expression recognition research with no part-based learning. It is a technique to punish the feature extractor’s local predictive power to coerce it to learn coarse-grained features (general facial expression). The Local Prediction Penalty forces the network to disregard predictive local signals learned from local receptive fields and instead depend on the global facial region. Second, we propose a Spatial Self-Attention method for fine-grained feature representation to learn distinct face features from pixel positions. The Spatial Self-Attention accumulates attention features at privileged positions without changing the spatial feature dimension. Lastly, we leverage a classifier to carefully combine all learned features (coarse-grained and fine-grained) for better feature representation. Extensive experiments demonstrate that our proposed methods significantly improve facial expression recognition performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.