Abstract

Effectively utilizing temporal information is critical for human pose estimation in videos. Recent methods either neglect the displacements of keypoints in the video frames, or rely on time-consuming optical flow estimation when fusing temporal information. By contrast, we propose a flow-free and displacement-aware algorithm for pose estimation in videos. Our method is based on the observation that the appearance of the body keypoints remains almost unchanged throughout a video. This motivates us to exploit temporal visual consistency of keypoints via temporal feature correlation to establish sparse correspondences between the keypoints in neigh-boring frames. Specifically, we first extract keypoint features from the previous frame, which can be treated as exemplars to search on the intermediate feature map of the current frame. Then we conduct temporal feature correlation for the keypoint search, and the obtained correlation maps are combined with the convolutional features to further guide heatmap estimation. Extensive experiments demonstrate that the proposed method compares favorably against state-of-the-art approaches on both sub-JHMDB and Penn Action datasets. More importantly, our method is robust to large keypoint displacements and could be applied to videos under fast motion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call