Abstract

Although state-of-the-art methods for human pose estimation have achieved superior results on the single image, their performance on videos usually deteriorates dramatically due to motion blur and occlusion. Since there is close temporal correlation among video frames, exploiting the contextual information properly can be helpful to tackle the problem. In this paper, we present a Temporal Feature Enhancing Network (TFEN) for video human pose estimation. It boosts the per-frame features by utilizing motion information in terms of optical flow and conducting temporal feature encoding by the convolution gated recurrent units (convGRU). It is an end-to-end learning framework and can extend any image based algorithm to video pose estimation. The experimental results validate the effectiveness of the proposed approach on two large-scale video pose estimation benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call