Abstract
Human pose estimation is an important task in computer vision and an essential step for computers to understand human motion and behavior. However, accurate localization of keypoints for small individuals in multi-person pictures is often ignored by current methods, resulting in limited improvements in accuracy. In addition, the current mainstream methods of translating a predicted heatmap into a coordinate in the original image space is too coarse. This coarseness also affects the localization of keypoints. To address these challenges, we propose an adaptive human body size module(AHBZM), spatial selective attention module(SSAM) and more accurate heatmap translator(MAHT) for human pose estimation. The proposed AHBZM utilizes trainable parameters to select a more appropriate multi-scale fusion method to further refine the localization of keypoints for different body sizes. To further improve keypoints localization, SSAM is used to capture target spatial information during feature fusion. The proposed MAHT will more accurately add the pixel offsets when translating heatmap coordinates into original image coordinates, while more closely associating the global maximum value in the heatmap with the surrounding local maximum values. The experimental results show that the proposed method has achieved good results on the two benchmark datasets of COCO and MPII. Our code is available at: https://github.com/illusory2333/Adaptive-module-and-heatmap-translator.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.