Abstract

As wearing face masks is becoming an embedded practice due to the COVID-19 pandemic, facial expression recognition (FER) that takes face masks into account is now a problem that needs to be solved. In this paper, we propose a face parsing and vision Transformer-based method to improve the accuracy of face-mask-aware FER. First, in order to improve the precision of distinguishing the unobstructed facial region as well as those parts of the face covered by a mask, we re-train a face-mask-aware face parsing model, based on the existing face parsing dataset automatically relabeled with a face mask and pixel label. Second, we propose a vision Transformer with a cross attention mechanism-based FER classifier, capable of taking both occluded and non-occluded facial regions into account and reweigh these two parts automatically to get the best facial expression recognition performance. The proposed method outperforms existing state-of-the-art face-mask-aware FER methods, as well as other occlusion-aware FER methods, on two datasets that contain three kinds of emotions (M-LFW-FER and M-KDDI-FER datasets) and two datasets that contain seven kinds of emotions (M-FER-2013 and M-CK+ datasets).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.