Abstract
Multimodal sentiment analysis (MSA) is crucial as it integrates textual, visual, and audio information from videos to accurately identify human emotional states. This study proposes an innovative multimodal feature decoupling strategy that categorizes sentiment features into common and private features. The private features aim to accurately capture the uniqueness of each modality, thereby increasing feature diversity. In contrast, the common features seek to identify and capture commonalities among different modalities, thus reducing potential information loss during decoupling. To achieve this, we designed dedicated encoders and loss function constraints for both types of features. Additionally, to mitigate information redundancy and prevent key information loss during decoupled representation learning, we introduce a dual feature reconstruction mechanism comprising unimodal feature reconstruction (UFR) and multimodal feature reconstruction (MFR). These mechanisms preserve vital information from the decoupling process and mitigate the impact of redundant data. Our extensive experiments on three datasets demonstrate that our method achieves a significant margin of approximately 1%–3% in accuracy, illustrating that our approach outperforms existing advanced techniques significantly, resulting in noteworthy performance enhancements.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.