Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection

Aixuan Li,Yuxin Mao,Yuchao Dai,Jing Zhang

doi:10.1109/tcsvt.2023.3285249

Abstract

In this paper, we present a weakly-supervised RGB-D salient object detection model via scribble supervision. Specifically, as a multimodal learning task, we focus on effective multimodal representation learning via inter-modal mutual information regularization. In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection. Based on our multimodal representation learning framework, we introduce an asymmetric feature extractor for our multimodal data, which is proven more effective than the conventional symmetric backbone setting. We also introduce multimodal variational auto-encoder as stochastic prediction refinement techniques, which takes pseudo labels from the first training stage as supervision and generates refined prediction. Experimental results on benchmark RGB-D salient object detection datasets verify both effectiveness of our explicit multimodal disentangled representation learning method and the stochastic prediction refinement strategy, achieving comparable performance with the state-of-the-art fully <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">supervised</i> models. Our code and data are available at: https://npucvr.github.io/MIRV/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Jan 1, 2024
Citations: 3

Similar Papers

Learning Explainable Disentangled Representations of E-Commerce Data by Aligning Their Visual and Textual Attributes
Katrien Laenen ... Marie-Francine Moens
Computers | VOL. 11
Katrien Laenen, et. al.Katrien Laenen ... Marie-Francine Moens
10 Dec 2022
Computers | VOL. 11

Multimodal Representation Learning: Advances, Trends and Challenges
Su-Fang Zhang ... Yan Zhan
-
Su-Fang Zhang, et. al.Su-Fang Zhang ... Yan Zhan
01 Jul 2019
01 Jul 2019

Adapt and explore: Multimodal mixup for representation learning
Ronghao Lin ... Haifeng Hu
Information Fusion | VOL. 105
Ronghao Lin, et. al.Ronghao Lin ... Haifeng Hu
28 Dec 2023
Information Fusion | VOL. 105

Learning Multimodal Representations by Symmetrically Transferring Local Structures
Bin Dong ... Kai Lu
Symmetry | VOL. 12
Bin Dong, et. al.Bin Dong ... Kai Lu
13 Sep 2020
Symmetry | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology