AbstractFacial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross‐domain FER applications when applied to different datasets. FER under cross‐dataset also suffers from difficulties such as feature distribution deviation and discriminator degradation. To address these issues, we propose a prototype‐oriented similarity transfer framework (POST) for cross‐domain FER. The bidirectional cross‐attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains, enabling the extraction of relevant cross‐domain features. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains, ensuring enhanced domain alignment by leveraging both cross‐domain and specific domain features. We further introduce the self‐training resampling (STR) strategy to enhance similarity transfer. The experimental results with the RAF‐DB dataset as the source domain and the CK+, FER2013, JAFFE and SFEW 2.0 datasets as the target domains, show that our approach achieves much higher performance than the state‐of‐the‐art cross‐domain FER methods.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access