Abstract
Modality alignment can maintain the consistency of semantics in multi-modal emotion recognition tasks, ensuring that features from different modalities accurately represent the emotion-related information in an encoding space. However, current alignment models either focus only on the local fusion of different modal representations or lack a mining process for unimodal specificity information. We design a Semantic Alignment network based on Multi-Spatial learning (SAMS) for multi-modal emotion recognition, which achieves local and global alignment between modalities using high-level emotion representations of different modalities as supervisory signals. SAMS builds a multi-spatial learning framework for each modality, and constructs a self-modal interaction module under this framework based on cross-modal semantic learning. SAMS provides two learning spaces for each modality, one to detect the affective information for a specific modality, and the other to learn semantic knowledge from other modalities. Subsequently, the features of these two spaces are aligned in temporal and utterance levels by homologous encoding and different target constraints. Based on the alignment characteristics of these two spaces, a self-modal interaction is built to investigate the fusion representation by exploring the global correlation between the alignment features in unimodal multi-spatial learning. In experiments, our proposed model yields consistent improvements on two standard multi-modal benchmarks, and outperforms state-of-the-art approaches. The code of our SAMS is available at: https://github.com/xiaomi1024/code_SAMS.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.