A multi-task framework based on decomposition for multimodal named entity recognition

Chenran Cai,Qianlong Wang,Bing Qin,Ruifeng Xu

doi:10.1016/j.neucom.2024.128388

Abstract

Given a text-image pair, Multimodal Named Entity Recognition (MNER) is the task of identifying and categorizing entities in the text. Most existing work performs named entity labeling directly using final token representations derived by fusing image and text representations. Although they achieve promising results, these work may fail to effectively exploit text and image modalities. This is because they neglect the difference in the role of the two modalities: text modality can detect the boundary of an entity, while image modality is introduced to disambiguate the category of the entity. Based on these findings, in this paper, we construct two auxiliary tasks based on the decomposition strategy and propose a multi-task framework for MNER. Specifically, we first decompose MNER into two auxiliary tasks: entity boundary detection task and entity category classification task. Here, the former treats only the text modality as input and outputs the boundary labels, since it can achieve satisfactory boundary results by itself. The latter uses two modalities to yield category labels where image modality is dedicated to disambiguating categories. These two auxiliary tasks allow the effective exploitation of text and image modalities and put them back into their respective roles. Then, we vectorize their results to improve entity recognition using label clues from auxiliary tasks. Finally, we fuse features from text and image modalities and label embeddings from auxiliary tasks to fulfill MNER. Experimental results on two widely used MNER datasets show that our framework can yield new SOTA performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A multi-task framework based on decomposition for multimodal named entity recognition

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Multi-task Learning-based Text Classification with Subword-Phrase Extraction
Yusuke Kimura ... Takahiro Komamizu
-
Yusuke Kimura, et. al.Yusuke Kimura ... Takahiro Komamizu
01 Dec 2022
01 Dec 2022

Image-Text Cross-Media Feature Correlation based on Adversarial Network
Ying Xia ... Gengquan Tian
-
Ying Xia, et. al.Ying Xia ... Gengquan Tian
01 Dec 2019
01 Dec 2019

An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection
Eniafe Festus Ayetiran ... Özlem Özgöbek
Information Systems | VOL. 123
Eniafe Festus Ayetiran, et. al.Eniafe Festus Ayetiran ... Özlem Özgöbek
16 Mar 2024
Information Systems | VOL. 123

Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu ... Wen Zhang
-
Yushan Zhu, et. al.Yushan Zhu ... Wen Zhang
17 Oct 2021
17 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A multi-task framework based on decomposition for multimodal named entity recognition

Abstract

Talk to us

Similar Papers

More From: Neurocomputing