Area-keywords cross-modal alignment for referring image segmentation

Huiyong Zhang,Lichun Wang,Shuang Li,Kai Xu,Baocai Yin

doi:10.1016/j.neucom.2024.127475

Abstract

Referring image segmentation aims to segment the instance corresponding to the given language description, which requires aligning information from two modalities. Existing approaches usually align the cross-modal information based on different forms of feature units, such as pixel-sentence, pixel-word and patch-word. The problem is that the semantic information embodied by these feature units may be mismatched, for example, the semantics transferred by a pixel is a part of the semantics of a sentence. When using this inconsistent information to model the relationship between feature units from two modalities, the obtained relationship between the modes are imprecise, resulting in inaccurate cross-modal features. In this paper, we propose to generate scalable area and keywords features to ensure that the feature units from the two modalities have comparable semantic granularity. Meanwhile, the scalable features provide sparse representation for image and text, which reduces computation complexity for computing cross-modal features. In addition, we design a novel multi-source driven dynamic convolution to inversely map the area-keywords cross-modal features to image to predicate mask. The experimental results on three benchmark datasets demonstrate that our proposed framework achieves advanced performance, and the calculation amount of the model has been greatly reduced.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Area-keywords cross-modal alignment for referring image segmentation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Mar 11, 2024
Citations: 2

Similar Papers

Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation
Jiachen Li ... Yongjian Liu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Jiachen Li, et. al.Jiachen Li ... Yongjian Liu
05 Oct 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Referring Image Segmentation with Multi-Modal Feature Interaction and Alignment Based on Convolutional Nonlinear Spiking Neural Membrane Systems.
Siyan Sun ... Zhicai Liu
International journal of neural systems | VOL. 34
Siyan Sun, et. al.Siyan Sun ... Zhicai Liu
23 Sep 2024
International journal of neural systems | VOL. 34

Generative Adversarial Network Including Referring Image Segmentation For Text-Guided Image Manipulation
Yuto Watanabe ... Keisuke Maeda
-
Yuto Watanabe, et. al.Yuto Watanabe ... Keisuke Maeda
23 May 2022
23 May 2022

A survey of methods for addressing the challenges of referring image segmentation
Lixia Ji ... Han Zhang
Neurocomputing | VOL. 583
Lixia Ji, et. al.Lixia Ji ... Han Zhang
24 Mar 2024
Neurocomputing | VOL. 583

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Area-keywords cross-modal alignment for referring image segmentation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing