Joint multimodal entity-relation extraction based on temporal enhancement and similarity-gated attention

Guoxiang Wang,Jin Liu,Jialong Xie,Zhenwei Zhu,Fengyu Zhou

doi:10.1016/j.knosys.2024.112504

Abstract

Joint Multimodal Entity and Relation Extraction (JMERE), which needs to combine complex image information to extract entity-relation quintuples from text sequences, posts higher requirements of the model’s multimodal feature fusion and selection capabilities. With the advancement of large pre-trained language models, existing studies focus on improving the feature alignments between textual and visual modalities. However, there remains a noticeable gap in capturing the temporal information present in textual sequences. In addition, these methods exhibit a certain deficiency in distinguishing irrelevant images when integrating image and text features, making them susceptible to interference from image information unrelated to the text. To address these challenges, we propose a temporally enhanced and similarity-gated attention network (TESGA) for joint multimodal entity relation extraction. Specifically, we first incorporate an LSTM-based Text Temporal Enhancement module to enhance the model’s ability to capture temporal information from the text. Next, we introduce a Text-Image Similarity-Gated Attention mechanism, which controls the degree of incorporating image information based on the consistency between image and text features. Subsequently, We design the entity and relation prediction module using a form-filling approach based on entity and relation types, and conduct prediction of entity-relation quintuples. Notably, apart from the JMERE task, our approach can also be applied to other tasks involving text-visual enhancement, such as Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE). To demonstrate the effectiveness of our approach, our model is extensively experimented on three benchmark datasets and achieves state-of-the-art performance. Our code will be available upon paper acceptance.11https://github.com/vacuum-cup/TESGA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint multimodal entity-relation extraction based on temporal enhancement and similarity-gated attention

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Similar Papers

Predicting Implicit User Preferences with Multimodal Feature Fusion for Similar User Recommendation in Social Media
Jenq-Haur Wang ... Long Wang
Applied Sciences | VOL. 11
Jenq-Haur Wang, et. al.Jenq-Haur Wang ... Long Wang
25 Jan 2021
Applied Sciences | VOL. 11

Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction
Guozheng Li ... Wenjun Ke
-
Guozheng Li, et. al.Guozheng Li ... Wenjun Ke
01 Aug 2024
01 Aug 2024

Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors
Guozheng Li ... Yikai Guo
-
Guozheng Li, et. al.Guozheng Li ... Yikai Guo
01 Aug 2024
01 Aug 2024

Integrating image and text information for biomedical information retrieval
Sameer Antani
-
Sameer AntaniSameer Antani
01 Oct 2010
01 Oct 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint multimodal entity-relation extraction based on temporal enhancement and similarity-gated attention

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems