Abstract

Multi-modal entity relationship recognition is a crucial foundation for constructing accurate and comprehensive domain knowledge graphs. However, due to the sparse and diverse nature of textual semantic information and the semantic differences exhibited by multi-modal data, existing methods often suffer from issues such as information loss and noise features. In this paper, the Hierarchical Visual Semantic Guidance (HVSG) Network is proposed, which utilizes images to furnish visual semantic information for text, facilitating the identification of implicit relationships between entities. Specifically, salient local instance objects are initially extracted from the global image, followed by the use of a hierarchical visual semantic construction module to establish multi-level visual semantics. The integration of multi-level visual and textual features is then achieved through a visual semantic guidance module. This work constructed a Chinese multi-modal entity relation classification dataset in the domain of unmanned vessels and conducted experiments on two datasets. Compared to state-of-the-art (SOTA) models, HVSG achieved the best performance, with F1 scores improved by 1.05% and 0.58% respectively, indicating that HVSG can offer better performance in exploring implicit relationships between textual entities.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.