Abstract

Due to the ambiguous expressions and the subjectiveness of annotators, annotation ambiguity is a serious obstacle for facial expression recognition (FER). Ambiguous annotation exists in similar and dissimilar classes, which we call ambiguity and noise. The previous state-of-the-art approaches use uncertainty to generalize the two categories, and adopt uncertainty learning to suppress uncertainty samples. However, ambiguous expressions are confused with noisy label expressions may bias the model toward easy samples and hurt the generalization capability. To solve this problem, we propose a novel approach to mine ambiguity and noise (MAN) in FER datasets. Specifically, we design a co-division module, which divides the datasets into clean, ambiguous and noisy label expressions based on the consistency and inconsistency between the predictions of two networks and the given labels. To effectively learn the clean expressions, improve discriminative ability and avoid memorizing noisy labels, the tri-regularization module employs supervised learning, mutuality learning and unsupervised learning for the three subsets, respectively. Extensive experiments have shown that MAN can effectively mine the real ambiguity and noise, and achieve state-of-the-art performance in both synthetic noisy datasets and popular benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call