Multimodal Sentiment Analysis Based on Interactive Transformer and Soft Mapping

Zuhe Li,Lujuan Deng,Qian Sun,Jianwei Zhang,Qingbing Guo,Qiuwen Zhang,Fengqin Wang,Chengyao Feng,Mohamed Elhoseny

doi:10.1155/2022/6243347

Abstract

Multimodal sentiment analysis aims to harvest people’s opinions or attitudes from multimedia data through fusion techniques. However, existing fusion methods cannot take advantage of the correlation between multimodal data but introduce interference factors. In this paper, we propose an Interactive Transformer and Soft Mapping based method for multimodal sentiment analysis. In the Interactive Transformer layer, an Interactive Multihead Guided-Attention structure composed of a pair of Multihead Attention modules is first utilized to find the mapping relationship between multimodalities. Then, the obtained results are fed into a Feedforward Neural Network. The Soft Mapping layer consisting of stacking Soft Attention module is finally used to map the results to a higher dimension to realize the fusion of multimodal information. The proposed model can fully consider the relationship between multiple modal pieces of information and provides a new solution to the problem of data interaction in multimodal sentiment analysis. Our model was evaluated on benchmark datasets CMU-MOSEI and MELD, and the accuracy is improved by 5.57% compared with the baseline standard.

Highlights

Sentiment analysis aims to detect affective states or subjective information from data
Traditional sentiment analysis mainly focuses on text data, using statistical knowledge combined with natural language processing and machine learning techniques to study and analyze the sentiment polarity of sentences or documents [1]
We utilize the Guided-Attention mechanism [13] to introduce the information of other modalities. en, the unimodal results are mapped to higher dimensions for fusion, and the final decision is made according to the fusion results. e main contributions of this work can be summarized as follows: (i) We propose a multimodal sentiment analysis model based on Interactive Transformer and Soft Mapping. is model can achieve the optimal decision of each modality and fully consider the correlation information between different modalities

Summary

Introduction

Sentiment analysis aims to detect affective states or subjective information from data. E blue section in Figure 1 shows the decision-level fusion mechanism, which first conducts single modality sentiment analysis independently and fuses the results to obtain the final decision [7, 8]. For the interaction between features, multimodal features in a shared space need to be aligned through semantic space barriers across different domains to achieve semantic fusion Several methods such as feature concatenation [9, 10] and attention mechanisms [11] have been developed to solve this problem. In view of the above two problems, it can be seen that, in the process of multimodal information fusion, making full use of the correlation among different modalities to make each modality learn from each other is the key to multimodal sentiment analysis.

Related Work

Data Preparation

Baselines

Experiments

Findings

Conclusions and Future Work