Adaptive module and accurate heatmap translator for multi-person human pose estimation

Yongfeng Qi,Huili Chen,Panpan Cao,Hengrui Zhang,Shengcong Wen,Anye Liang

doi:10.1016/j.cag.2024.103926

Abstract

Human pose estimation is an important task in computer vision and an essential step for computers to understand human motion and behavior. However, accurate localization of keypoints for small individuals in multi-person pictures is often ignored by current methods, resulting in limited improvements in accuracy. In addition, the current mainstream methods of translating a predicted heatmap into a coordinate in the original image space is too coarse. This coarseness also affects the localization of keypoints. To address these challenges, we propose an adaptive human body size module(AHBZM), spatial selective attention module(SSAM) and more accurate heatmap translator(MAHT) for human pose estimation. The proposed AHBZM utilizes trainable parameters to select a more appropriate multi-scale fusion method to further refine the localization of keypoints for different body sizes. To further improve keypoints localization, SSAM is used to capture target spatial information during feature fusion. The proposed MAHT will more accurately add the pixel offsets when translating heatmap coordinates into original image coordinates, while more closely associating the global maximum value in the heatmap with the surrounding local maximum values. The experimental results show that the proposed method has achieved good results on the two benchmark datasets of COCO and MPII. Our code is available at: https://github.com/illusory2333/Adaptive-module-and-heatmap-translator.

Full Text