Abstract

Multimodal sentiment analysis (MSA) is important for quickly and accurately understanding people's attitudes and opinions about an event. However, existing sentiment analysis methods suffer from the dominant contribution of text modality in the dataset; this is called text dominance. In this context, we emphasize that weakening the dominant role of text modality is important for MSA tasks. To solve the above two problems, from the perspective of datasets, we first propose the Chinese multimodal opinion-level sentiment intensity (CMOSI) dataset. Three different versions of the dataset were constructed: manually proofreading subtitles, generating subtitles using machine speech transcription, and generating subtitles using human cross-language translation. The latter two versions radically weaken the dominant role of the textual model. We randomly collected 144 real videos from the Bilibili video site and manually edited 2557 clips containing emotions from them. From the perspective of network modeling, we propose a multimodal semantic enhancement network (MSEN) based on a multiheaded attention mechanism by taking advantage of the multiple versions of the CMOSI dataset. Experiments with our proposed CMOSI show that the network performs best with the text-unweakened version of the dataset. The loss of performance is minimal on both versions of the text-weakened dataset, indicating that our network can fully exploit the latent semantics in nontext patterns. In addition, we conducted model generalization experiments with MSEN on MOSI, MOSEI, and CH-SIMS datasets, and the results show that our approach is also very competitive and has good cross-language robustness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.