Abstract

Recently, researches are emerging to extend CNN-based segmentation approaches from still image to video. Directly applying per-frame based image segmentation networks on video is not efficient. To address this issue, a promising direction is to explore the video continuity. A state-of-the-art approach called Deep Feature Flow (DFF) runs the segmentation network only on sparse key frames and propagates feature maps to other frames via cross-frame motion. However, such approach does not work well for hand segmentation in a video as it is not robust to hand posture change. In this paper, we propose to incorporate a light-weight detail enhancement network (DEN) into the DFF framework to achieve robustness against cross-frame motion and hand posture change. Experimental results on a public depth video dataset, FingerPaint, demonstrate that our approach achieves higher segmentation accuracy than the DFF-based approach with similar speedups against the per-frame based video hand segmentation approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call