Abstract

Human pose estimation is an important task in computer vision and an essential step for computers to understand human motion and behavior. However, accurate localization of keypoints for small individuals in multi-person pictures is often ignored by current methods, resulting in limited improvements in accuracy. In addition, the current mainstream method of translating a predicted heatmap into a coordinate in the original image space is too coarse. This coarseness also affects the localization of keypoints. To address these challenges, we propose an adaptive human body size module(AHBZM), spatial selective attention module(SSAM) and more accurate heatmap translator(MAHT) for human pose estimation. The proposed AHBZM utilizes trainable parameters to select a more appropriate multi-scale fusion method to further refine the localization of keypoints for different body sizes. To further improve keypoints localization, SSAM is used to capture target spatial information during feature fusion. The proposed MAHT will more accurately add the pixel offsets when translating heatmap coordinates into original image coordinates, while more closely associating the global maximum value in the heatmap with the surrounding local maximum values. The experimental results show that the proposed method has achieved good results on the two benchmark datasets of COCO and MPII. Our code is available at: https://github.com/illusory2333/Adaptive-module-and-heatmap-translator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call