Abstract
Recently, deep convolutional networks have achieved great success in image hand segmentation. However, such CNN-based methods mainly focus on per-frame inference, which is inefficient for hand segmentation in videos as continuity and redundancy exist in contiguous video frames. We present an approach to extending CNN-based hand segmentation methods from still image to video. It consists of two main branches: flow-guided feature propagation and light-weight occlusion aware detail enhancement. The flow-guided feature propagation branch runs an image segmentation network only on sparse frames and warps intermediate features to other frames according to the cross-frame flow field. Compared with the per-frame inference, it achieves significant speedup while causing large accuracy degradation due to distortion and occlusion issues in the warping. By introducing the light-weight occlusion aware detail enhancement branch, which extracts low-level detail features from each frame with spatial attention on occlusion regions, our approach becomes more robust against the distortion and occlusion issues and achieves a better accuracy–latency tradeoff. Experiments on three public egocentric video datasets, namely Egohands, GTEA and EDSH, have demonstrated the effectiveness and efficiency of our approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.