Affective Region Recognition and Fusion Network for Target-Level Multimodal Sentiment Classification

Li Jia,Huan Rong,Tinghua Ma,Najla Al-Nabhan

doi:10.1109/tetc.2022.3231746

Li Jia, Huan Rong + Show 2 more

https://doi.org/10.1109/tetc.2022.3231746

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

With the development of multimodal sentiment analysis tasks, target-level/aspect-level multimodal sentiment analysis has received more attention, aiming to intelligently judge the sentiment orientation of target words using visual and textual information. Most existing methods mainly rely on combining the whole image and text while ignoring the implicit affective regions in the image. We introduce a novel affective region recognition and fusion network (ARFN) for target-level multimodal sentiment classification, which focuses more on the alignment of multimodal fusion of visual and textual. First, to produce a visual representation with sentiment elements, ARFN employs the Yolov5 algorithm to extract the object region of the image and selects the emotional area according to the strategy. Next, this method learns target-sensitive visual representations and text semantic representations through a multi-head attention mechanism and pre-trained models BERT, respectively. Moreover, ARFN fuses textual and visual representations through a multimodal interaction method to perform target-level multimodal sentiment classification tasks. We achieve state-of-the-art performance on two available multimodal Twitter datasets, and experimental results show the effectiveness of our approach.

Full Text