Abstract
Multimodal sentiment analysis (MSA) aims to infer emotions from linguistic, auditory, and visual sequences. Multimodal information representation method and fusion technology are keys to MSA. However, the problem of difficulty in fully obtaining heterogeneous data interactions in MSA usually exists. To solve these problems, a new framework, namely, dynamic invariant-specific representation fusion network (DISRFN), is put forward in this study. Firstly, in order to effectively utilize redundant information, the joint domain separation representations of all modes are obtained through the improved joint domain separation network. Then, the hierarchical graph fusion net (HGFN) is used for dynamically fusing each representation to obtain the interaction of multimodal data for guidance in the sentiment analysis. Moreover, comparative experiments are performed on popular MSA data sets MOSI and MOSEI, and the research on fusion strategy, loss function ablation, and similarity loss function analysis experiments is designed. The experimental results verify the effectiveness of the DISRFN framework and loss function.
Highlights
Multimodal sentiment analysis (MSA), as an emerging field of natural language processing (NLP), aims to infer the speaker’s emotion by exploring clues in multimodal information [1,2,3]
A dynamic fusion mechanism is established to fuse the modal features to obtain the interactive information. is study mainly aims to explore a sentiment analysis framework based on multimodal representation learning and the dynamical fusion method
Research on fusion strategy, research on loss function ablation, and research on similarity loss selection are designed
Summary
Multimodal sentiment analysis (MSA), as an emerging field of natural language processing (NLP), aims to infer the speaker’s emotion by exploring clues in multimodal information [1,2,3]. Is study mainly aims to explore a sentiment analysis framework based on multimodal representation learning and the dynamical fusion method. The DSN is improved and adopted to perform multimodal sentiment analysis tasks in this paper It is named improved joint domain separation network (improved JDSN). The improved JDSN is adopted to learn the joint representation of modality-invariant and modalityspecific of all modes in the common–special subspace. The modal interactions were mostly obtained by feature connection fusion in early work [14] These methods are unable to dynamically adjust the contribution of each mode in the fusion process. (1) A multimodal sentiment analysis framework (DISRFN) is proposed in this study It can perform the fusion of various representations dynamically while emphasizing learning invariant and specific joint representations of various modes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.