Assisting Multimodal Named Entity Recognition by cross-modal auxiliary tasks

Zhengjie Chen,Yu Zhang,Siya Mi

doi:10.1016/j.patrec.2023.10.004

Zhengjie Chen, Yu Zhang + Show 1 more

https://doi.org/10.1016/j.patrec.2023.10.004

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Although the existing Multimodal Named Entity Recognition (MNER) methods have achieved promising performance, they suffer from the following drawbacks in social media scenarios. Firstly, most existing methods are based on a strong assumption that the textual content and the associated images are matched, which is not always valid in real scenarios; Secondly, current methods fail to filter out modality-specific random noise, which impedes models from exploiting modality-shared features. In this paper, a novel multi-task multimodal learning architecture is put forward, which aims to improve Multimodal Named Entity Recognition (MNER) performance by cross-modal auxiliary tasks (CMAT). Specifically, we first separate the shared and task-specific features for the main task and auxiliary tasks respectively, which is accomplished by cross-modal gate-control mechanism. Subsequently, without extra pre-processing or annotations, we utilize the cross-modal matching to address the issue of mismatched image-text pairs, and the cross-modal mutual information maximization to optimize the most relevant cross-modal features. Moreover, experimental results on the two widely used datasets confirm the superiority of our proposed approach.

Full Text