Abstract

Recently, deep convolutional networks have achieved great success in image hand segmentation. However, such CNN-based methods mainly focus on per-frame inference, which is inefficient for hand segmentation in videos as continuity and redundancy exist in contiguous video frames. We present an approach to extending CNN-based hand segmentation methods from still image to video. It consists of two main branches: flow-guided feature propagation and light-weight occlusion aware detail enhancement. The flow-guided feature propagation branch runs an image segmentation network only on sparse frames and warps intermediate features to other frames according to the cross-frame flow field. Compared with the per-frame inference, it achieves significant speedup while causing large accuracy degradation due to distortion and occlusion issues in the warping. By introducing the light-weight occlusion aware detail enhancement branch, which extracts low-level detail features from each frame with spatial attention on occlusion regions, our approach becomes more robust against the distortion and occlusion issues and achieves a better accuracy–latency tradeoff. Experiments on three public egocentric video datasets, namely Egohands, GTEA and EDSH, have demonstrated the effectiveness and efficiency of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call