Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation

Fanchao Lin,Yongdong Zhang,Chuanbin Liu,Hongtao Xie

doi:10.1109/tcsvt.2021.3127562

Abstract

Weakly-supervised video object segmentation is an emerging video task to track and segment the target given a simple bounding box label, which requires the method to fully catch and utilize the target information. Most existing approaches only rely on the guidance of a single frame and ignore the interaction between different frames when gathering information, making them hard to achieve reliable target representation. In this paper, we propose to capture the temporal dependencies and gather information from multiple frames through bilateral temporal re-aggregation. We explore three schemes to build the aggregation: 1) a two-stage re-aggregation mechanism is applied to provide target prior to the current frame, which obtains more valid feature matching and information aggregation; 2) a query-memory bilateral aggregation module is proposed to aggregate features from an unlimited amount of past frames and enable the mutual perception between different frames to validate the gathered information; 3) we guide the learning of aggregation modules through a novel cross-task representation distillation, transferring the knowledge from a semi-supervised model to our weakly-supervised model without increasing the inference latency. These schemes collaboratively build an efficient and competent aggregation process, thus we can fully exploit the video context to make the inference. Experimental results on four benchmarks show that our method achieves superior performance than previous methods and still maintains the efficiency ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$e.g$ </tex-math></inline-formula> ., overall scores of 70.4% and 72.5% on the YouTube-VOS and DAVIS 2017 validation sets, respectively).

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: Jul 1, 2022
Citations: 9

Similar Papers

M-LVC: Multiple Frames Prediction for Learned Video Compression
Jianping Lin ... Houqiang Li
-
Jianping Lin, et. al.Jianping Lin ... Houqiang Li
01 Jun 2020
01 Jun 2020

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation
Fanchao Lin ... Yan Li
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 35
Fanchao Lin, et. al.Fanchao Lin ... Yan Li
18 May 2021
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 35

Dual-branch multi-information aggregation network with transformer and convolution for polyp segmentation
Wenyu Zhang ... Yawen Hu
Computers in Biology and Medicine | VOL. 168
Wenyu Zhang, et. al.Wenyu Zhang ... Yawen Hu
30 Nov 2023
Computers in Biology and Medicine | VOL. 168

VPU: A Video-Based Point Cloud Upsampling Framework
Kaisiyuan Wang ... Dong Xu
IEEE Transactions on Image Processing | VOL. 31
Kaisiyuan Wang, et. al.Kaisiyuan Wang ... Dong Xu
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society