Human pose estimation in crowded scenes using Keypoint Likelihood Variance Reduction

Longsheng Wei,Xuefu Yu,Zhiheng Liu

doi:10.1016/j.displa.2024.102675

Abstract

Human pose estimation can be applied to many computer vision tasks, such as human–computer interaction, motion recognition, and action detection. However, few previous methods focused on the pose estimation problem in crowded scenes. Connection-based bottom-up approaches are the main pipelines in multi-person pose estimation. Keypoint detection, connection detection and pose assembly are the main processes in connection-based methods. However, the prediction accuracy of these three processes in pose estimation will be significantly affected when applied into crowded scenes. In this paper, we utilize an improved method called Keypoint Likelihood Variance Reduction (KLVR) to decode the representation of keypoints to improve keypoint detection accuracy in crowded scenes. Moreover, we perform a noise filter after the keypoint detection process to constrain the noise peak that negatively affects the pose assembling process. In addition, to address the isolated human parts problem in crowded scenes caused by occlusion, we utilize Cycle Skeleton Structure (CSS) for our pose assembling process. In the experiment, our method outperforms previous methods on the CrowdPose test dataset.

Full Text