Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Dong Zhang,Hanqian Wu,Guodong Zhou,Suzhong Wei,Shoushan Li,Qiaoming Zhu

doi:10.1609/aaai.v35i16.17687

Abstract

Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into pre-defined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attention-based multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 71

Similar Papers

Multi-modal graph contrastive encoding for neural machine translation
Yongjing Yin ... Jiebo Luo
Artificial Intelligence | VOL. 323
Yongjing Yin, et. al.Yongjing Yin ... Jiebo Luo
28 Jul 2023
Artificial Intelligence | VOL. 323

A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yongjing Yin ... Zhengyuan Yang
-
Yongjing Yin, et. al.Yongjing Yin ... Zhengyuan Yang
01 Jan 2020
01 Jan 2020

Multimodal deep fusion for image question answering
Weifeng Zhang ... Wei Wang
Knowledge-Based Systems | VOL. 212
Weifeng Zhang, et. al.Weifeng Zhang ... Wei Wang
28 Nov 2020
Knowledge-Based Systems | VOL. 212

On Multi-modal Fusion Learning in constraint propagation
Yaoyi Li ... Hongtao Lu
Information Sciences | VOL. 462
Yaoyi Li, et. al.Yaoyi Li ... Hongtao Lu
20 Jun 2018
Information Sciences | VOL. 462

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence