Referring Image Segmentation With Fine-Grained Semantic Funneling Infusion.

Jiaxing Yang,Lihe Zhang,Huchuan Lu

doi:10.1109/tnnls.2023.3281372

Abstract

Recently, referring image segmentation has attracted wide attention given its huge potential in human-robot interaction. Networks to identify the referred region must have a deep understanding of both the image and language semantics. To do so, existing works tend to design various mechanisms to achieve cross-modality fusion, for example, tile and concatenation and vanilla nonlocal manipulation. However, the plain fusion usually is either coarse or constrained by the exorbitant computation overhead, finally causing not enough understanding of the referent. In this work, we propose a fine-grained semantic funneling infusion (FSFI) mechanism to solve the problem. The FSFI introduces a constant spatial constraint on the querying entities from different encoding stages and dynamically infuses the gleaned language semantic into the vision branch. Moreover, it decomposes the features from different modalities into more delicate components, allowing the fusion to happen in multiple low-dimensional spaces. The fusion is more effective than the one only happening in one high-dimensional space, given its ability to sink more representative information along the channel dimension. Another problem haunting the task is that the instilling of high-abstract semantic will blur the details of the referent. Targetedly, we propose a multiscale attention-enhanced decoder (MAED) to alleviate the problem. We design a detail enhancement operator (DeEh) and apply it in a multiscale and progressive way. Features from the higher level are used to generate attention guidance to enlighten the lower-level features to more attend to the detail regions. Extensive results on the challenging benchmarks show that our network performs favorably against the state-of-the-arts (SOTAs).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Referring Image Segmentation With Fine-Grained Semantic Funneling Infusion.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems

Lead the way for us

Journal: IEEE transactions on neural networks and learning systems	Publication Date: Oct 1, 2024
Citations: 1

Similar Papers

IEEE Transactions on Neural Networks and Learning Systems Information for Authors
-
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
--
01 Nov 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

IEEE Computational Intelligence Society Information
-
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
--
01 Nov 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

HTD-TS3: Weakly Supervised Hyperspectral Target Detection Based on Transformer via Spectral-Spatial Similarity.
Haonan Qin ... Yunsong Li
IEEE transactions on neural networks and learning systems | VOL. 35
Haonan Qin, et. al.Haonan Qin ... Yunsong Li
01 Nov 2024
IEEE transactions on neural networks and learning systems | VOL. 35

A Novel k-Means Framework via Constrained Relaxation and Spectral Rotation.
Jingwei Chen ... Hongyun Jiang
IEEE transactions on neural networks and learning systems | VOL. 35
Jingwei Chen, et. al.Jingwei Chen ... Hongyun Jiang
01 Nov 2024
IEEE transactions on neural networks and learning systems | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Referring Image Segmentation With Fine-Grained Semantic Funneling Infusion.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems