Multi-grained fusion network with self-distillation for aspect-based multimodal sentiment analysis

Juan Yang,Yali Xiao,Xu Du

doi:10.1016/j.knosys.2024.111724

Abstract

Aspect-based multimodal sentiment analysis (ABMSA) is an important branch of multimodal sentiment analysis. The goal of ABMSA is to use multimodal information to infer users’ sentiment polarity toward the targeted aspect for supporting corresponding decision-making. The existing ABMSA methods usually focus on exploring aspect-aware fine-grained interactions and demonstrate the benefits of integrating multimodal information. However, such approaches still suffer from the following limitations: (1) coarse-grained semantic information extraction is ignored, (2) the relevance between image and aspect is ignored, and (3) model-agnostic techniques to improve multimodal representations are not employed. To address these limitations, we propose a novel multimodal approach named multi-grained fusion network with self-distillation (MGFN-SD). This approach includes unimodal feature extraction, multi-grained representation learning, and self-distillation-based sentiment prediction. The multi-grained representation learning module extracts fine- and coarse-grained interactions based on the aspect–image relevance computation and the similarity calculation for dynamically filtering potential noise brought by the original image. In the sentiment prediction module, self-distillation is employed to transfer knowledge from both hard and soft labels to supervise the training process of each student classifier for improving the quality of multimodal representations. The experimental results on three benchmark datasets indicate the superiority, rationality, and robustness of MGFN-SD.

Full Text