Abstract

Social media comments are no longer in a single textual modality, but heterogeneous data in multiple modalities, such as vision, sound, and text, which is why multimodal sentiment analysis strategies has been introduced. However, among the multimodal sentiment analysis domains, a majority of the current multimodal sentiment analysis models employ the Transformer architecture due to its great impact and benefits, thereby leading to an augmentation in resource overhead. In this paper, a multimodal attention fusion (MAF) network model is proposed for sentiment analysis of multimodal data. MAF is mainly composed of the cross attention and residual unit. The Cross Attention Unit is designed to select one core modality out of three modes, while the remaining modes serve as base modalities. The core modality is then combined with the base modality information to facilitate significant interaction between the two modalities, resulting in three sets of two-by-two attention computations. Moreover, a residual unit is employed to integrate the overall information into the attentional information. This approach not only enables modality-to-modality interaction, but also supplements the overall information. In the end, experiments are conducted on two publicly available multimodal sentiment analysis datasets from Carnegie Mellon University(CMU), CMU-MOSEI (abbreviated as MOSEI) and CMU-MOSI (abbreviated as MOSI), to validate that the method achieves high performance while removing the complex structure, and is comparable to the State-Of-The-Art(SOTA) model with high-performance A100 and V100 Graphics Processing Units(GPU) in an ordinary hardware environment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.