Recently, deep convolutional networks have achieved great success in image hand segmentation. However, such CNN-based methods mainly focus on per-frame inference, which is inefficient for hand segmentation in videos as continuity and redundancy exist in contiguous video frames. We present an approach to extending CNN-based hand segmentation methods from still image to video. It consists of two main branches: flow-guided feature propagation and light-weight occlusion aware detail enhancement. The flow-guided feature propagation branch runs an image segmentation network only on sparse frames and warps intermediate features to other frames according to the cross-frame flow field. Compared with the per-frame inference, it achieves significant speedup while causing large accuracy degradation due to distortion and occlusion issues in the warping. By introducing the light-weight occlusion aware detail enhancement branch, which extracts low-level detail features from each frame with spatial attention on occlusion regions, our approach becomes more robust against the distortion and occlusion issues and achieves a better accuracy–latency tradeoff. Experiments on three public egocentric video datasets, namely Egohands, GTEA and EDSH, have demonstrated the effectiveness and efficiency of our approach.