SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval

Yuankun Liu,Xiang Yuan,Zhijie Tan,Jinsong Huang,Weiping Li,Haochen Li,Tong Mo,Jingjie Xiao

doi:10.1145/3664816

Abstract

Image-text retrieval, a fundamental cross-modal task, performs similarity reasoning for images and texts. The primary challenge for image-text retrieval is cross-modal semantic heterogeneity, where the semantic features of visual and textual modalities are rich but distinct. Scene graph is an effective representation for images and texts as it explicitly models objects and their relations. Existing scene graph based methods have not fully taken the features regarding various granularities implicit in scene graph into consideration (e.g. triplets), the inadequate feature matching incurs the absence of non-trivial semantic information (e.g. inner relations among triplets). Therefore, we propose a S emantic-Consistency E nhanced M ulti-Level Scene Graph Matching (SEMScene) network, which exploits the semantic relevance between visual and textual scene graphs from fine-grained to coarse-grained. Firstly, under the scene graph representation, we perform feature matching including low-level node matching, mid-level semantic triplet matching, and high-level holistic scene graph matching. Secondly, to enhance the semantic-consistency for object-fused triplets carrying key correlation information, we propose a dual-step constraint mechanism in mid-level matching. Thirdly, to guide the model to learn the semantic-consistency of matched image-text pairs, we devise effective loss functions for each stage of the dual-step constraint. Comprehensive experiments on Flickr30K and MS-COCO datasets demonstrate that SEMScene achieves state-of-the-art performances with significant improvements.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: May 11, 2024
License type: mit

Similar Papers

DSGEM: Dual scene graph enhancement module‐based visual question answering
Boyue Wang ... Baocai Yin
IET Computer Vision | VOL. 17
Boyue Wang, et. al.Boyue Wang ... Baocai Yin
07 Mar 2023
IET Computer Vision | VOL. 17

Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang ... Ziwei Yao
-
Sijin Wang, et. al.Sijin Wang ... Ziwei Yao
01 Mar 2020
01 Mar 2020

HSGMP
Yu Duan ... Yun Xiong
-
Yu Duan, et. al.Yu Duan ... Yun Xiong
24 Aug 2021
24 Aug 2021

Visual-Semantic Matching by Exploring High-Order Attention and Distraction
Yongzhi Li ... Yadong Mu
-
Yongzhi Li, et. al.Yongzhi Li ... Yadong Mu
01 Jun 2020
01 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications