Stacked hourglass networks based on polarized self-attention for human pose estimation

Xiaoxia Luo,Feibiao Li,Wei Qin

doi:10.1117/12.2622889

Abstract

The human body pose estimation mainly locates the position of the key points of the human body in the image. The stacked hourglass network uses top-down and bottom-up feature extraction methods to obtain better results in the task of human pose estimation. However, in the process of feature extraction, the resolution of the image will be lost, and it will have a greater impact on the positioning of the key points of the human body. Therefore, this paper incorporates a polarized self-attention mechanism into the stacked hourglass network. A polarized self-attention (PSA) module is added before the second convolution of the basic residual block and added before the max pool down-sampling and the nearest neighbor up-sampling for each stage of the hourglass module. The space and channel of the PSA are used to maintain a high feature resolution, thereby improving the accuracy of the model's positioning of the key points of the human body posture. Finally, experiments on the human body pose estimation data set (MPII) show that the improved network PCKh@0.5 reaches 92.6%, which is 1.5% higher than the original model, which further illustrates the correctness and effectiveness of the network.

Full Text