Abstract

Attention-based networks currently identify their effectiveness in multimodal sentiment analysis. However, existing methods ignore the redundancy of auxiliary modalities. More importantly, existing methods only attend to top-down attention (static process) or down-top attention (implicit process), leading to the coarse-grained multimodal sentiment context. In this paper, during the preprocessing period, we first propose the multimodal dynamic enhanced block to capture the intra-modality sentiment context. This can effectively decrease the intra-modality redundancy of auxiliary modalities. Furthermore, the bi-direction attention block is proposed to capture fine-grained multimodal sentiment context via the novel bi-direction multimodal dynamic routing mechanism. Specifically, the bi-direction attention block first highlights the explicit and low-level multimodal sentiment context. Then, the low-level multimodal context is transmitted to a carefully designed bi-direction multimodal dynamic routing procedure. This allows us to dynamically update and investigate high-level and much more fine-grained multimodal sentiment contexts. The experiments demonstrate that our fusion network can achieve state-of-the-art performance. Notably, our model outperforms the best baseline on the metric ‘Acc-7’ with an improvement of 6.9%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call