Abstract
Multi-person Pose Estimation (MPPE) aims to reconstruct human poses by locating and connecting keypoints of individuals in input images. The variability of human poses and the complexity of scenes make MPPE reliant on both local details and global structures, and the absence of either can lead to the generation of deformed poses. With the emergence of Transformer, the performance of MPPE has been significantly improved. However, due to self-attention computing attention scores between each pair of positions, the current Transformer-based MPPE exhibits high quadratic complexity. To address these issues, this paper proposes a novel pose estimation model, MRSAPose. MRSAPose utilizes Multi-level Routing Sparse Attention (MRSA) to dynamically select relevant regions for attention, reducing computational complexity and mitigating the impact of irrelevant regions. Furthermore, MRSAPose constructs a Transformer-CNN Parallel Interaction Block (T-CP block) through MRSA and Recursive Residual Gated Convolution (Res-gnConv), facilitating parallel learning of global and local information. By relying on multi-level routing algorithms and high-order spatial interactions conducted by recursive processing of adjacent features, T-CP block helps MRSAPose effectively alleviates the issues of occlusion and misalignment in pose estimation. On multiple challenging keypoint datasets, MRSAPose outperforms current state-of-the-art algorithms, particularly excelling in crowded and occluded scenes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.