Abstract

Human pose estimation has numerous applications in motion recognition, virtual reality, human–computer interaction, and other related fields. However, multi-person pose estimation in crowded and occluded scenes is challenging. One major issue about the current top-down human pose estimation approaches is that they are limited to predicting the pose of a single person, even when the bounding box contains multiple individuals. To address this problem, we propose a novel Crowd and Occlusion-aware Network (CONet) using a divide-and-conquer strategy. Our approach includes a Crowd and Occlusion-aware Head (COHead) which estimates the pose of both the occluder and the occluded person using two separate branches. We also use the attention mechanism to guide the branches for differentiated learning, aiming to improve feature representation. Additionally, we propose a novel interference point loss to enhance the model’s anti-interference ability. Our CONet is simple yet effective, and it outperforms the state-of-the-art model by +1.6 AP, achieving 71.6 AP on CrowdPose. Our proposed model has achieved state-of-the-art results on the CrowdPose dataset, demonstrating its effectiveness in improving the accuracy of human pose estimation in crowded and occluded scenes. This achievement highlights the potential of our model in many real-world applications where accurate human pose estimation is crucial, such as surveillance, sports analysis, and human–computer interaction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call