Abstract
In multimodal sentiment analysis (MSA), the fusion strategies of multimodal features significantly influence the performance of MSA models. Previous works frequently face challenges in integrating heterogeneous data without fully leveraging the rich semantic content of text, resulting in poor information association. We propose an MSA model based on Text-Driven Crossmodal Fusion and Mutual Information Estimation, called TeD-MI, which comprises a Stacked Text-Driven Crossmodal Fusion (STDC) module, which efficiently fusions the three modalities driven by the text modality to optimize fusion feature representation and enhance semantic understanding. Furthermore, TeD-MI designed a mutual information estimation module to achieve the best balance between preserving task-related information and filtering out irrelevant noise information as much as possible. Comprehensive experiments conducted on the CMU-MOSI and CMU-MOSEI datasets demonstrate our proposed model achieves varying degrees of improvement across most evaluation metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal on Semantic Web and Information Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.