Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Minghan Li,Lida Li,Shuai Li,Lei Zhang

doi:10.1109/cvpr46437.2021.01106

Minghan Li, Lida Li + Show 2 more

Open Access

https://doi.org/10.1109/cvpr46437.2021.01106

Copy DOI

Abstract

Modern one-stage video instance segmentation networks suffer from two limitations. First, convolutional features are neither aligned with anchor boxes nor with ground-truth bounding boxes, reducing the mask sensitivity to spatial location. Second, a video is directly divided into individual frames for frame-level instance segmentation, ignoring the temporal correlation between adjacent frames. To address these issues, we propose a simple yet effective one-stage video instance segmentation framework by spatial calibration and temporal fusion, namely STMask. To ensure spatial feature calibration with ground-truth bounding boxes, we first predict regressed bounding boxes around ground-truth bounding boxes, and extract features from them for frame-level instance segmentation. To further explore temporal correlation among video frames, we aggregate a temporal fusion module to infer instance masks from each frame to its adjacent frames, which helps our frame-work to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses. Experiments on the YouTube-VIS valid set show that the proposed STMask with ResNet-50/-101 backbone obtains 33.5 % / 36.8 % mask AP, while achieving 28.6 / 23.4 FPS on video instance segmentation. The code is released online https://github.com/MinghanLi/STMask.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jun 1, 2021
Citations: 24	License type: other-oa

Similar Papers

Quantifying the Effects of Ground Truth Annotation Quality on Object Detection and Instance Segmentation Performance
Cathaoir Agnew ... Patrick Denny
IEEE Access | VOL. 11
Cathaoir Agnew, et. al.Cathaoir Agnew ... Patrick Denny
01 Jan 2023
IEEE Access | VOL. 11

Self-paced Learning to Improve Text Row Detection in Historical Documents with Missing Labels
Mihaela Găman ... Marius Popescu
-
Mihaela Găman, et. al.Mihaela Găman ... Marius Popescu
01 Jan 2023
01 Jan 2023

Syncretic-NMS: A Merging Non-Maximum Suppression Algorithm for Instance Segmentation
Jun Chu ... Shaoming Li
IEEE Access | VOL. 8
Jun Chu, et. al.Jun Chu ... Shaoming Li
01 Jan 2020
IEEE Access | VOL. 8

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.
Dengshan Li ... Rujing Wang
Micromachines | VOL. 13
Dengshan Li, et. al.Dengshan Li ... Rujing Wang
31 Dec 2021
Micromachines | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Abstract

Talk to us

Similar Papers