Abstract

Human pose estimation can be applied to many computer vision tasks, such as human–computer interaction, motion recognition, and action detection. However, few previous methods focused on the pose estimation problem in crowded scenes. Connection-based bottom-up approaches are the main pipelines in multi-person pose estimation. Keypoint detection, connection detection and pose assembly are the main processes in connection-based methods. However, the prediction accuracy of these three processes in pose estimation will be significantly affected when applied into crowded scenes. In this paper, we utilize an improved method called Keypoint Likelihood Variance Reduction (KLVR) to decode the representation of keypoints to improve keypoint detection accuracy in crowded scenes. Moreover, we perform a noise filter after the keypoint detection process to constrain the noise peak that negatively affects the pose assembling process. In addition, to address the isolated human parts problem in crowded scenes caused by occlusion, we utilize Cycle Skeleton Structure (CSS) for our pose assembling process. In the experiment, our method outperforms previous methods on the CrowdPose test dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call