Facial Expression Recognition (FER) suffers from misrecognition due to the similarities between expressions. To address this issue, popular works replace original annotations with soft labels to reflect expression similarities. However, existing soft label learning (SLL) modules are independent of FER modules. In this paper, inspired by automatic control theory, we propose a bias-based soft label learning network for FER named EC-Net. For optimizing FER and SLL modules jointly, EC-Net constitutes the closed-loop feedback between the two modules by designing a module measuring and transmitting the bias between FER module predictions and target labels. Specifically, EC-Net contains three modules: E-subNet, C-subNet, and L-Transmitter. Firstly, E-subNet, i.e., the FER module, attempts to converge to target labels under the supervision of soft labels, acting as the executor. Then, L-Transmitter measures the bias between E-subNet predictions and target labels. It converts multiple discrete biases to the bias-based label through spectral clustering and transmits it to C-subNet. Finally, C-SubNet, i.e., the SLL module, generates soft labels from the bias-based label with a cascaded learner and progressively distinguishes similar expressions. It updates the learned soft labels for E-subNet, performing like the controller. Supervised by the bias-based soft label, E-subNet effectively reduces the dominant bias caused by similar expressions. We conduct extensive experiments on four popular benchmarks, demonstrating the effectiveness of applying closed-loop feedback in the FER task.
Read full abstract